Nucleic acids compositions conferring dwarfing phenotype

ABSTRACT

This invention relates to putative known and unknown deoxyribonucleic acid (DNA) and amino acid sequences identified in one or more metabolic pathways that lead to dwarfism and stunting in plants and the use of these sequences in agriculture to create dwarf varieties of any plant species. This invention also relates to nucleic acids sequences and polypeptides that produce altered metabolism phenotypes in plants.

This application is the National Stage of International Application No.PCT/US01/23120, filed Jul. 20, 2001, which claims the benefit under 35U.S.C. 119(e) of U.S. Provisional Applications 60/219,809 filed Jul. 20,2000 and 60/219,810 filed Jul. 20, 2000.

FIELD OF THE INVENTION

This invention relates to putative known and unknown deoxyribonucleicacid (DNA) and amino acid sequences identified in one or more metabolicpathways that lead to dwarfism and stunting in plants and the use ofthese sequences in agriculture to create dwarf varieties of any plantspecies. This invention also relates to nucleic acid sequences andpolypeptides that alter metabolism in plants.

BACKGROUND OF THE INVENTION

The Green Revolution crops, introduced in the late 1960s and early1970s, produce several times as much grain as the traditional varietiesthey replaced, and they spread rapidly. They enabled India to double itswheat crop in seven years, dramatically increasing food supplies andaverting widely predicted famine. The Green Revolution's leadingresearch achievement was to hasten the perfection of dwarf spring wheat.Though it is conventionally assumed that farmers want a tall,impressive-looking harvest, in fact shrinking wheat and other crops hasoften proved beneficial. Bred for short stalks, plants expend lessenergy on growing inedible column sections and more on growing valuablegrain. Stout, short-stalked wheat also neatly supports its kernels,whereas tall-stalked wheat may bend over at maturity, complicatingreaping. Nature has favored genes for tall stalks, because in natureplants must compete for access to sunlight. In high-yield agricultureequally short-stalked plants will receive equal sunlight. Researchersare seeking dwarf strains of rice and other crops in order to increaseagronomic yields. Identification of genes and metabolic pathwaymodifications that can be used for creation of rapidly growing dwarfstrains would be especially useful for grain and cereal crops and alsofor other agronomically important crops such as forest trees, ornamentalspecies such as turfgrass, and plants such as Nicotiana sp. grown ashosts for biopharmaceutical manufacturing.

The discovery of putative known and unknown DNA and amino acid sequencesidentified in one or more metabolic pathways leading to dwarfism andstunting in plants satisfies a need in the art by providing newcompositions which are useful in agriculture to create dwarf varietiesof any plant species.

SUMMARY OF THE INVENTION

This invention relates to putative known and unknown deoxyribonucleicacid (DNA) and amino acid sequences identified in one or more metabolicpathways that lead to dwarfism and stunting in plants and the use ofthese sequences in agriculture to create dwarf varieties of any plantspecies. This invention also relates to nucleic acids sequences andpolypeptides, the expression of which cause altered metabolism and fattyacid production in plants.

In some embodiments, the present invention provides a compositioncomprising a nucleic acid selected from the group consisting of SEQ IDNOs:1-571. The present invention is not limited to the particularnucleic acid encoded by these sequences. Indeed, it is contemplated thatthe present invention contemplates variants, homologs, and portions orfragments of these sequences. Therefore, in some embodiments, thepresent invention provides a composition comprising a nucleic acidsequence that hybridizes to a sequence selected from the groupconsisting of SEQ ID NOs:1-571 under conditions ranging from low to highstringency. In still further embodiments, the present invention providesa composition comprising a nucleic acid that inhibits or competes withthe binding a nucleic acid selected from the group consisting of SEQ IDNOs:1-571 to their complements. In other embodiments, the presentinvention provides a composition comprising a nucleic acid thathybridizes to a sequence selected from the group of SEQ ID NOs:1-571 andwhich confers a dwarfing phenotype or altered metabolism phenotype whenexpressed in a plant.

In still further preferred embodiments, the present invention provides avector comprising a nucleic acid selected from the group consisting ofSEQ ID NOs:1-571. The present invention is not limited to the particularnucleic acid encoded by these sequences. Indeed, it is contemplated thatthe present invention contemplates variants, homologs, and portions orfragments of these sequences. Therefore, in some embodiments, thepresent invention provides a vector comprising a nucleic acid sequencethat hybridizes to a sequence selected from the group consisting of SEQID NOs:1-571 under conditions ranging from low to high stringency. Instill further embodiments, the present invention provides a vectorcomprising a nucleic acid that inhibits or competes with the binding anucleic acid selected from the group consisting of SEQ ID NOs:1-571 totheir complements. In other embodiments, the present invention providesa vector comprising a nucleic acid that hybridizes to a sequenceselected from the group of SEQ ID NOs:1-571 and which confers a dwarfingor altered metabolism phenotype when expressed in a plant.

In some embodiments, the present invention comprises a plant transfectedwith the nucleic acids or vectors described above. In still furtherembodiments, the present invention comprises the seeds, leaves, or oilproduced by the transfected plants.

In some preferred embodiments, the present invention provides acomposition comprising a nucleic acid selected from the group consistingof SEQ ID NOs:43, 49, 52, 79, 94, and 151. The present invention is notlimited to the particular nucleic acid encoded by these sequences.Indeed, it is contemplated that the present invention contemplatesvariants, homologs, and portions or fragments of these sequences.Therefore, in some embodiments, the present invention provides acomposition comprising a nucleic acid sequence that hybridizes to asequence selected from the group consisting of SEQ ID NOs:43, 49, 52,79, 94, and 151 under conditions ranging from low to high stringency. Instill further embodiments, the present invention provides a compositioncomprising a nucleic acid that inhibits or competes with the binding anucleic acid selected from the group consisting of SEQ ID NOs:43, 49,52, 79, 94, and 151 to their complements. In other embodiments, thepresent invention provides a composition comprising a nucleic acid thathybridizes to a sequence selected from the group of SEQ ID NOs:43, 49,52, 79, 94, and 151 and which confers a phenotype when expressed in aplant, the phenotype selected from the group consisting of dwarfing,alteration of fatty acid synthesis, and alteration of metabolism.

In still further preferred embodiments, the present invention provides avector comprising a nucleic acid selected from the group consisting ofSEQ ID NOs:43, 49, 52, 79, 94, and 151. The present invention is notlimited to the particular nucleic acid encoded by these sequences.Indeed, it is contemplated that the present invention contemplatesvariants, homologs, and portions or fragments of these sequences.Therefore, in some embodiments, the present invention provides a vectorcomprising a nucleic acid sequence that hybridizes to a sequenceselected from the group consisting of SEQ ID NOs:43, 49, 52, 79, 94, and151 under conditions ranging from low to high stringency. In stillfurther embodiments, the present invention provides a vector comprisinga nucleic acid that inhibits or competes with the binding a nucleic acidselected from the group consisting of SEQ ID NOs:43, 49, 52, 79, 94, and151 to their complements. In other embodiments, the present inventionprovides a vector comprising a nucleic acid that hybridizes to asequence selected from the group of SEQ ID NOs:43, 49, 52, 79, 94, and151 and which confers a phenotype when expressed in a plant, thephenotype selected from the group consisting of dwarfing, alteration offatty acid synthesis, and alteration of metabolism.

In some embodiments, the present invention comprises a plant transfectedwith the nucleic acids or vectors described above. In still furtherembodiments, the present invention comprises the seeds or oil producedby the transfected plants.

In some embodiments, the present invention provides methods comprisingproviding i) a vector comprising a nucleic acid sequence selected fromthe group consisting of SEQ ID NOs:43, 49, 52, 79, 94, and 151; ii) ahost cell; and transfecting the host cell with the vector underconditions such that fatty acid synthesis by the host cell is altered.In some preferred embodiments, the host cells are part of a plant.

In other embodiments, the present invention provides methods comprisingproviding i) a vector comprising a nucleic acid sequence selected fromthe group consisting nucleic acid sequences that hybridize to a sequenceselected from the group consisting of SEQ ID NOs:43, 49, 52, 79, 94, and151 under conditions of high stringency; ii) a host cell; andtransfecting the host cell with the vector under conditions such thatfatty acid synthesis by the host cell is altered. In some preferredembodiments, the host cells are part of a plant.

In some embodiments, the present invention provides methods comprisingproviding i) a vector comprising a nucleic acid sequence selected fromthe group consisting of SEQ ID NOs:43, 49, 52, 79, 94, and 151; ii) ahost cell; and transfecting the host cell with the vector underconditions such that the metabolism the host cell is altered. In somepreferred embodiments, the host cells are part of a plant.

In other embodiments, the present invention provides methods comprisingproviding i) a vector comprising a nucleic acid sequence selected fromthe group consisting of nucleic acid sequences that hybridize to asequence selected from the group consisting of SEQ ID NOs:43, 49, 52,79, 94, and 151 under conditions of high stringency; ii) a host cell;and transfecting the host cell with the vector under conditions suchthat the metabolism the host cell is altered. In some preferredembodiments, the host cells are part of a plant.

In further embodiments, the present invention provides methods forproducing dwarf industrial crops. It is contemplated that these dwarfindustrial crops will have increased yields as compared non-dwarfindustrial crops. Therefore, in some embodiments, the present inventionprovides methods comprising providing i) a nucleic acid selected fromthe group consisting of SEQ ID NOs:1-154 and ii) a plant or planttissue; and transfecting the plant or plant tissue with the nucleic acidsuch that it is expressed in at least a portion of the plant tissue orplant. The present invention is not limited to the particular sequencesencoded by SEQ ID NOs: 1-154. Indeed, a variety of nucleic acidsequences are contemplated. In particular, the present inventionencompasses nucleic acid sequences that bind to SEQ ID NOs: 1-154 underconditions of low to high stringency. In other embodiments, the presentinvention encompasses the use of nucleic acids that inhibit or competewith the binding nucleic acids encoded by SEQ ID NOs: 1-154 and theircomplements. In some embodiments, the nucleic acid is contained within avector. In other embodiments, the nucleic acid is operably linked to aconstitutive or tissue specific promoter. When a tissue specificpromoter is utilized, it is contemplated that the dwarfing effect willbe confined substantially to the tissue where the nucleic acid isexpressed. The present invention is not limited to any particular tissuespecific promoter. Indeed, a variety of tissue specific promoters arecontemplated, including, but not limited to, leaf, seed, stem, and rootspecific promoters. In other embodiments, the expression of the nucleicacid is increased as compared to wild-type plants. In still otherembodiments, the nucleic acid is expressed under conditions that a dwarfphenotype is observed. The present invention is not limited to anyparticular industrial crop. Indeed, a variety of industrial crops arecontemplated. In preferred embodiments, the industrial crop is selectedfrom corn, soybean, rice, wheat, oilseed rape, cotton, oats, barley, andpotato.

In still further preferred embodiments, the present invention providesthe plants produced from the above method. In some embodiments, theinvention provides a plant comprising a nucleic acid corresponding to atleast one of SEQ ID NOs. 1-154. The present invention is not limited tothe particular sequences encoded by SEQ ID NOs: 1-154. Indeed, a varietyof nucleic acid sequences are contemplated. In particular, the presentinvention encompasses nucleic acid sequences that bind to SEQ ID NOs:1-154 under conditions of low to high stringency. In other embodiments,the present invention encompasses the use of nucleic acids that inhibitor compete with the binding nucleic acids encoded by SEQ ID NOs: 1-154and their complements. In some embodiments, the nucleic acid iscontained within a vector. In further embodiments, the nucleic acid isoperably linked to a constitutive or tissue specific promoter. When atissue specific promoter is utilized, it is contemplated that thedwarfing effect will be confined substantially to the tissue where thenucleic acid is expressed. The present invention is not limited to anyparticular tissue specific promoter. Indeed, a variety of tissuespecific promoters are contemplated, including, but not limited to,leaf, seed, stem, and root specific promoters. In other embodiments, theexpression of the nucleic acid is increased as compared to wild-typeplants. In still other embodiments, the nucleic acid is expressed underconditions so that a dwarf phenotype is observed. The present inventionis not limited to any particular industrial crop. Indeed, a variety ofindustrial crops are contemplated. In preferred embodiments, theindustrial crop is selected from corn, soybean, rice, wheat, oilseedrape, cotton, oats, barley, and potato.

In some embodiments, the present invention comprises a plant transfectedwith the nucleic acids or vectors described above. In still furtherembodiments, the present invention comprises the seeds or oil producedby the transfected plants.

In still further embodiments, the present invention provides nucleicacids corresponding to contigs and orthologs or homologs predicted fromthe sequences that cause a stunting or dwarfing phenotype. Accordingly,in some embodiments, the present invention provides a compositioncomprising a nucleic acid selected from the group consisting of SEQ IDNOs: 155-279 and 344-571. In still further embodiments, the presentinvention provides a vector comprising a nucleic acid selected from thegroup consisting of SEQ ID NOs: 155-279 and 344-571. The presentinvention is not limited to the contig sequences disclosed herein. Thepresent invention also encompasses orthologs and homologs of the contigsequences. Therefore, the present invention provides compositionscomprising a nucleic acid sequence that hybridizes to a sequenceselected from the group consisting of SEQ ID NOs: 155-279 and 344-571under conditions of low to high stringency. In other embodiments, thepresent invention provides a composition comprising a nucleic acid thatinhibits the binding of a nucleic acid selected from the groupconsisting of SEQ ID NOs: 155-279 and 344-571 to their complementarysequences.

In other embodiments, the present invention provides plants transformedwith the contig sequences or vectors comprising the contig sequences.The present invention is not limited to any particular plant. Indeed, avariety of plants are contemplated, including, but not limited to, corn,soybean, rice, wheat, oilseed rape, cotton, oats, barley, and potatoplants.

In still other embodiments, the present invention provides methodscomprising providing i) a vector comprising a nucleic acid sequenceselected from the group consisting of SEQ ID NOs: 155-279 and 344-571;ii) a host cell; and transfecting the host cell with the vector underconditions such that fatty acid synthesis by the host cell is altered.In some preferred embodiments, the host cells are part of a plant.

In still further embodiments, the present invention provides methodscomprising providing i) a vector comprising a nucleic acid sequenceselected from the group consisting of nucleic acid sequences thathybridize to a sequence selected from the group consisting of SEQ IDNOs: 155-279 and 344-571 under conditions of low to high stringency; anda host cell; and transfecting the host cell with the vector underconditions such that fatty acid synthesis by the host cell is altered.In some embodiments, the present invention provides a compositioncomprising a nucleic acid that inhibits the binding of a nucleic acidselected from the group consisting of SEQ ID NOs: 155-279 and 344-571 totheir complementary sequences.

The present invention also provides methods for decreasing thesusceptibility of plants to insects and pests and increasing theresistance and tolerance of plants to insects and pests. It iscontemplated that expression of the nucleic acids in plants can lead toinsect tolerance or resistance by a variety of methods. In someinstances, expression of the nucleic acid sequence results in theproduction of a polypeptide that is directly toxic to an insect. Inother instances, resistance or tolerance is conferred through asecondary effect of expression of the nucleic acid (for example,expression results in the production of metabolic compounds, such assterols, that are toxic to an insect).

In some embodiments, the present invention provides methods comprisingproviding i) a nucleic acid selected from the group consisting of SEQ IDNOs: 3, 150, 151, 26, 31, 36, 58, 78, 94, 106, 107, 110, 112, 113, 114,117, 123; and ii) a plant having susceptibility to insects; andtransfecting the plant with the nucleic acid sequence under conditionssuch that the susceptibility is reduced.

In other embodiments, the present invention provides methods comprisingproviding i) a nucleic acid that hybridizes to a sequence selected fromthe group consisting of SEQ ID NOs: 3, 150, 151, 26, 31, 36, 58, 78, 94,106, 107, 110, 112, 113, 114, 117, 123 under conditions of low to highstringency; and ii) a plant having susceptibility to insects; andtransfecting the plant with the nucleic acid sequence under conditionssuch that the susceptibility is reduced.

In further embodiments, the present invention provides methodscomprising providing i) a vector comprising a nucleic acid sequenceselected from the group consisting of SEQ ID NOs: 3, 150, 151, 26, 31,36, 58, 78, 94, 106, 107, 110, 112, 113, 114, 117, 123; and ii) a planthaving susceptibility to insects; and transfecting the plant with thenucleic acid sequence under conditions such that the susceptibility isreduced.

In still further embodiments, the present invention provides methodscomprising providing i) a vector comprising a nucleic acid sequence thathybridizes to a sequence selected from the group consisting of SEQ IDNOs: 3, 150, 151, 26, 31, 36, 58, 78, 94, 106, 107, 110, 112, 113, 114,117, 123 under conditions of low to high stringency; and ii) a planthaving susceptibility to insects; and transfecting the plant with thenucleic acid sequence under conditions such that the susceptibility isreduced.

In some embodiments, the present intention provides methods providing i)a composition comprising a nucleic acid that hybridizes to a sequenceselected from the group consisting of SEQ ID NOs: 3, 150, 151, 26, 31,36, 58, 78, 94, 106, 107, 110, 112, 113, 114, 117, 123 under conditionsof low to high stringency; and ii) a plant; and transfecting the plantwith the composition under conditions such the resistance of the plantto insects is increased.

In still other embodiments, the present invention provides methodscomprising providing i) a composition comprising a nucleic acid thathybridizes to a sequence selected from the group consisting of SEQ IDNOs: 3, 150, 151, 26, 31, 36, 58, 78, 94, 106, 107, 110, 112, 113, 114,117, 123 under conditions of low to high stringency; and ii) a plant;and transfecting the plant with the composition under conditions suchthat the tolerance of the plants to insects is increased.

In some embodiments, the nucleic acid sequences are operably linked to apromoter (for example, a constitutive or tissue specific promoter). Thepresent invention is not limited to any particular tissue specificpromoter. Indeed, a variety of tissue specific promoters arecontemplated, including, but not limited to, leaf, seed, stem, and rootspecific promoters.

In some embodiments, the present invention provides a compositioncomprising a polypeptide encoded by nucleic acids selected from thegroup consisting of SEQ ID NOs: SEQ ID NOs: 3, 150, 151, 26, 31, 36, 58,78, 94, 106, 107, 110, 112, 113, 114, 117, 123 and portions thereof. Instill further embodiments, the present invention provides methods forcontrolling an insect comprising providing a composition comprising apolypeptide encoded by a nucleic acid selected from the group consistingof SEQ ID NOs: 3, 150, 151, 26, 31, 36, 58, 78, 94, 106, 107, 110, 112,113, 114, 117, 123 and portions thereof; and orally introducing to aninsect an insecticidally effective amount of the polypeptide.

In still further preferred embodiments, the present invention providesan isolated nucleic acid comprising one of SEQ ID NOs: 334, 280, 335,336, 281, 282, 283, 284, 285, 286, 287, 288, 289, 338, 290, 291, 292,293, 294, 295, 296, 297, 298, 299, 339, 300, 301, 302, 303, 304,306,307,308,309,310,311,312,314,315,316,317,318, 319, 320, 321, 322,323, 324, 326, 327, 328, 329, 330, 333, or 337 or sequences thathybridize to the foregoing sequences under conditions of low stringency,wherein expression of the nucleic acid in a plant results in an alteredmetabolism phenotype. In some preferred embodiments, the presentinvention provides a vector comprising one the foregoing sequences. Inparticularly preferred embodiments, the nucleic acid is operably linkedto an exogenous promoter, preferably a plant promoter. In someembodiments, the nucleic acid is in sense orientation while in otherembodiments, the nucleic acid is in antisense orientation.

In still further embodiments, the present invention provides a planttransfected with a nucleic acid, composition, or vector comprising oneof SEQ ID NOs: 334, 280, 335, 336, 281, 282, 283, 284, 285, 286, 287,288, 289, 338, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 339,300, 301,302,303, 304,306,307, 308, 309, 310, 311, 312, 314, 315, 316,317, 318, 319, 320, 321, 322, 323, 324, 326, 327, 328, 329, 330, 333, or337 or sequences that hybridize to the foregoing sequences underconditions of low stringency. In some embodiments, the present inventionprovides a seed from such a plant. In other embodiments, the presentinvention provides a leaf from such a plant.

In still further embodiments, the present invention provides and nucleicacid, composition, or vector comprising one of SEQ ID NOs: 334, 280,335, 336, 281, 282, 283, 284, 285, 286, 287, 288, 289, 338, 290, 291,292, 293, 294, 295, 296, 297, 298, 299, 339, 300, 301, 302, 303, 304,306, 307, 308, 309, 310, 311, 312, 314, 315, 316, 317, 318, 319, 320,321, 322, 323, 324, 326, 327, 328, 329, 330, 333, or 337 or sequencesthat hybridize to the foregoing sequences under conditions of lowstringency for use in altering the metabolism of a plant.

In some embodiments, the present invention provides methods and/orprocesses for making a transgenic plant comprising providing a vectorcomprising one of SEQ ID NOs: 334, 280, 335, 336, 281, 282, 283, 284,285, 286, 287, 288, 289, 338, 290, 291, 292, 293, 294, 295, 296, 297,298, 299, 339, 300, 301, 302, 303, 304, 306, 307, 308, 309, 310, 311,312, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 326, 327,328, 329, 330, 333, or 337 or sequences that hybridize to the foregoingsequences under conditions of low stringency and a plant; andtransfecting the plant with the vector. In other embodiments, thepresent invention provides methods and/or processes for altering themetabolism of a plant comprising providing a vector comprising one ofSEQ ID NOs: 334, 280, 335, 336, 281, 282, 283, 284, 285, 286, 287, 288,289, 338, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 339, 300,301, 302, 303, 304, 306, 307, 308, 309, 310, 311, 312, 314, 315, 316,317, 318, 319, 320, 321, 322, 323, 324, 326, 327, 328, 329, 330, 333, or337 or sequences that hybridize to the foregoing sequences underconditions of low stringency and a plant; and transfecting the plantwith the vector under conditions such that the metabolism of the plantis altered.

In still further preferred embodiments, the present invention providesan isolated nucleic acid selected from the group consisting of SEQ IDNOs: 334, 280, 335, 336, 281, 282, 283, 284, 285, 286, 287, 288, 289,338, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 339, 300, 301,302, 303, 304, 306, 307, 308, 309, 310, 311, 312, 314, 315, 316, 317,318, 319, 320, 321, 322, 323, 324, 326, 327, 328, 329, 330, 333, 337 andnucleic acid sequences that hybridize to any thereof under conditions oflow stringency for use in altering the metabolism of a plant.

In still other embodiments, the present invention provides an isolatednucleic acid comprising one of SEQ ID NOs: 47, 58, 336, 288, 291, 297,302, 304, 313, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, or332 or sequences that hybridize to the foregoing sequences underconditions of low stringency, wherein expression of the sequences in aplant results in a stunting phenotype. In some preferred embodiments,the present invention provides a vector comprising one the foregoingsequences. In particularly preferred embodiments, the nucleic acid isoperably linked to an exogenous promoter, preferably a plant promoter.

In some embodiments, the nucleic acid is in sense orientation while inother embodiments, the nucleic acid is in antisense orientation. Instill further embodiments, the present invention provides a planttransfected with a nucleic acid, composition, or vector comprising oneof SEQ ID NOs: 47, 58, 336, 288, 291, 297, 302, 304, 313, 321, 322, 323,324, 325, 326, 327, 328, 329, 330, 331, or 332 or sequences thathybridize to the foregoing sequences under conditions of low stringency.In some embodiments, the present invention provides a seed from such aplant. In other embodiments, the present invention provides a leaf fromsuch a plant.

In still further embodiments, the present invention provides and nucleicacid, composition, or vector comprising one of SEQ ID NOs: 47, 58, 336,288, 291, 297, 302, 304, 313, 321, 322, 323, 324, 325, 326, 327, 328,329, 330, 331, or 332 or sequences that hybridize to the foregoingsequences under conditions of low stringency for use in stunting thegrowth of a plant or particular tissue of a plant.

In some embodiments, the present invention provides methods and/orprocesses for making a transgenic plant comprising providing a vectorcomprising one of SEQ ID NOs: 47, 58, 336, 288, 291, 297, 302, 304, 313,321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, or 332 orsequences that hybridize to the foregoing sequences under conditions oflow stringency and a plant; and transfecting the plant with the vector.In other embodiments, the present invention provides methods and/orprocesses for altering the metabolism of a plant comprising providing avector comprising one of SEQ ID NOs: 47, 58, 336, 288, 291, 297, 302,304, 313, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, or 332or sequences that hybridize to the foregoing sequences under conditionsof low stringency and a plant; and transfecting the plant with thevector under conditions such that the growth of the plant or aparticular tissue of the plant is stunted.

In still other embodiments, the present invention provides an isolatednucleic acid selected from the group consisting of SEQ ID NOs: 47, 58,336, 288, 291, 297, 302, 304, 313, 321, 322, 323, 324, 325, 326, 327,328, 329, 330, 331, 332, and nucleic acid sequences that hybridize toany thereof under conditions of low stringency for use in producing astunting phenotype in a plant.

In further embodiments, the nucleic acid is operably linked to aconstitutive or tissue specific promoter. When a tissue specificpromoter is utilized, it is contemplated that the dwarfing effect willbe confined substantially to the tissue where the nucleic acid isexpressed. The present invention is not limited to any particular tissuespecific promoter. Indeed, a variety of tissue specific promoters arecontemplated, including, but not limited to, leaf, seed, stem, and rootspecific promoters. In other embodiments, the expression of the nucleicacid is increased as compared to wild-type plants. In still otherembodiments, the nucleic acid is expressed under conditions that a dwarfphenotype is observed. The present invention is not limited to anyparticular industrial crop. Indeed, a variety of industrial crops arecontemplated. In preferred embodiments, the industrial crop is selectedfrom corn, soybean, rice, wheat, oilseed rape, cotton, oats, barley, andpotato.

In still other embodiments, the present invention provides a nucleicacid, composition, vector, or plant substantially as described herein inany of the examples.

In still other embodiments, the present invention provides methodsand/or processes for the characterization of fractionated biologicalsamples, comprising providing i) one or more fractionated biologicalsamples; ii) a plurality of references samples; iii) a gaschromatography apparatus; iv) a mass spectroscopy apparatus; and v) dataanalysis software; and treating the fractionated biological samples andthe reference samples with the gas chromatography apparatus to generatechromatographic data corresponding to the fractionated biologicalsamples and the reference samples; treating the fractionated biologicalsamples and the reference samples with the mass spectroscopy apparatusto generate spectroscopic data corresponding to the fractionatedbiological samples and the reference samples; and processing thechromatographic and the spectroscopic data with the data analysissoftware, wherein the processing comprises the steps of data reduction,two-dimensional peak matching, quantitative peak differentiation, peakidentification, data sorting, and customized reporting.

In some particularly preferred embodiments, data reduction comprises thegeneration of peak tables corresponding to the chromatographic data. Infurther embodiments, the peak tables comprise retention time, retentionindex, raw peak areas, and normalized peak areas data corresponding tothe chromatographic data. In still further embodiments, thetwo-dimensional peak matching comprises the steps of a) matching peaksfrom the chromatographic data corresponding to the reference sample andthe spectral data corresponding to the biological samples to generatedpaired peaks, wherein the paired peaks have the same retention index;and b) matching the paired peaks based on the spectroscopic data togenerate matched peaks and unmatched peaks. In still furtherembodiments, quantitative peak differentiation comprises furtherprocessing the matched peaks to determine a threshold of change for eachof the matched peaks.

In still further embodiments, matched peaks not meeting a minimumthreshold for change are discarded. In still further embodiments, peakidentification comprises searching mass spectral libraries with thespectroscopic data. In still further embodiments, a searching stepgenerates chemical abstract services numbers corresponding to the peaks.In other embodiments, peak identification further comprises searchingbiotechnology databases, wherein the biotechnology databases compriseschemical structures. In some embodiments, data sorting comprisesgenerating an preliminary analyst report corresponding to the biologicalsamples. In further embodiments, custom reporting comprises modifyingthe preliminary analyst report to generate a final report.

DESCRIPTION OF THE DRAWINGS

FIGS. 1 a-pp provide sequences for SEQ ID NOs: 1-154 and 279.

FIGS. 2 a-f provide sequences for SEQ ID NOs: 155-171.

FIGS. 3 a-c provide sequences for SEQ ID NOs: 172-179.

FIGS. 4 a-n provide sequences for SEQ ID NOs: 180-216.

FIGS. 5 a-l provide sequences for SEQ ID NOs: 217-279.

FIGS. 6 a-d provide a Table presenting the orientation of 152 of thesequences.

FIGS. 7 a-7 q provide sequences for SEQ ID NOs: 280-343.

FIGS. 8 a-d summarize the GC/FID parameters used to analyze metabolitesamples.

FIGS. 9 a-9 ppp provide sequences for SEQ ID NOs:344-571.

FIGS. 10 a-10 ffff provide tables summarizing the metabolic changesproduced by expression of the indicated sequences in plants.

DEFINITIONS

Before the present proteins, nucleotide sequences, and methods aredescribed, it should be noted that this invention is not limited to theparticular methodology, protocols, cell lines, vectors, and reagentsdescribed herein as these may vary. It should also be understood thatthe terminology used herein is for the purpose of describing particularaspects of the invention, and is not intended to limit its scope, whichwill be limited only by the appended claims.

It must be noted that as used herein and in the appended claims, thesingular forms “a”, “an”, and “the” include plural reference unless thecontext clearly dictates otherwise. Thus, for example, reference to “ahost cell” includes a plurality of such host cells, reference to the“antibody” is a reference to one or more antibodies and equivalentsthereof known to those skilled in the art, and so forth.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meanings as commonly understood by one of ordinary skillin the art to which this invention belongs. Although any methods andmaterials similar or equivalent to those described herein can be used inthe practice or testing of the present invention, the preferred methods,devices, and materials are now described. All publications mentionedherein are incorporated herein by reference for the purpose ofdescribing and disclosing the cell lines, vectors, and methodologieswhich are reported in the publications which might be used in connectionwith the invention. Nothing herein is to be construed as an admissionthat the invention is not entitled to antedate such disclosure by virtueof prior invention.

“Acylate”, as used herein, refers to the introduction of an acyl groupinto a molecule, (for example, acylation).

“Adjacent”, as used herein, refers to a position in a nucleotidesequence immediately 5′ or 3′ to a defined sequence.

“Agonist”, as used herein, refers to a molecule which, when bound to apolypeptide (for example, a polypeptide encoded by a nucleic acid of thepresent invention), increases the biological or immunological activityof the polypeptide. Agonists may include proteins, nucleic acids,carbohydrates, or any other molecules which bind to the protein.

“Alterations” in a polynucleotide (for example, a polypeptide encoded bya nucleic acid of the present invention), as used herein, comprise anydeletions, insertions, and point mutations in the polynucleotidesequence. Included within this definition are alterations to the genomicDNA sequence which encodes the polypeptide.

“Amino acid sequence”, as used herein, refers to an oligopeptide,peptide, polypeptide, or protein sequence, and fragments or portionsthereof, and to naturally occurring or synthetic molecules. “Amino acidsequence” and like terms, such as “polypeptide” or “protein” as recitedherein are not meant to limit the amino acid sequence to the complete,native amino acid sequence associated with the recited protein molecule.

“Amplification”, as used herein, refers to the production of additionalcopies of a nucleic acid sequence and is generally carried out usingpolymerase chain reaction (PCR) technologies well known in the art(Dieffenbach, C. W. and G. S. Dveksler (1995) PCR Primer, a LaboratoryManual, Cold Spring Harbor Press, Plainview, N.Y.).

“Antibody” refers to intact molecules as well as fragments thereof whichare capable of specific binding to a epitopic determinant. Antibodiesthat bind a polypeptide (for example, a polypeptide encoded by a nucleicacid of the present invention) can be prepared using intact polypeptidesor fragments as the immunizing antigen. These antigens may be conjugatedto a carrier protein, if desired.

“Antigenic determinant”, “determinant group”, or “epitope of anantigenic macromolecule”, as used herein, refer to any region of themacromolecule with the ability or potential to elicit, and combine with,specific antibody. Determinants exposed on the surface of themacromolecule are likely to be immunodominant, that is, more immunogenicthan other (immunorecessive) determinants which are less exposed, whilesome (for example, those within the molecule) are non-immunogenic(immunosilent). As used herein, “antigenic determinant” refers to thatportion of a molecule that makes contact with a particular antibody (forexample, an epitope). When a protein or fragment of a protein is used toimmunize a host animal, numerous regions of the protein may induce theproduction of antibodies which bind specifically to a given region orthree-dimensional structure on the protein; these regions or structuresare referred to as antigenic determinants. An antigenic determinant maycompete with the intact antigen (the immunogen used to elicit the immuneresponse) for binding to an antibody.

The term “antisense”, as used herein, refers to a deoxyribonucleotidesequence whose sequence of deoxyribonucleotide residues is in reverse 5′to 3′ orientation in relation to the sequence of deoxyribonucleotideresidues in a sense strand of a DNA duplex. A “sense strand” of a DNAduplex refers to a strand in a DNA duplex which is transcribed by a cellin its natural state into a “sense mRNA.” Thus an “antisense” sequenceis a sequence having the same sequence as the non-coding strand in a DNAduplex. The term “antisense RNA” refers to a RNA transcript that iscomplementary to all or part of a target primary transcript or mRNA andthat blocks the expression of a target gene by interfering with theprocessing, transport and/or translation of its primary transcript ormRNA. The complementarity of an antisense RNA may be with any part ofthe specific gene transcript, for example, at the 5′ non-codingsequence, 3′ non-coding sequence, introns, or the coding sequence. Inaddition, as used herein, antisense RNA may contain regions of ribozymesequences that increase the efficacy of antisense RNA to block geneexpression. “Ribozyme” refers to a catalytic RNA and includessequence-specific endoribonucleases. “Antisense inhibition” refers tothe production of antisense RNA transcripts capable of preventing theexpression of the target protein.

“Anti-sense inhibition”, as used herein, refers to a type of generegulation based on cytoplasmic, nuclear, or organelle inhibition ofgene expression due to the presence in a cell of an RNA moleculecomplementary to at least a portion of the mRNA being translated. It isspecifically contemplated that DNA molecules may be from either an RNAvirus or mRNA from the host cell genome or from a DNA virus.

“Antagonist” or “inhibitor”, as used herein, refer to a molecule which,when bound to a polypeptide (for example, a polypeptide encoded by anucleic acid of the present invention), decreases the biological orimmunological activity of the polypeptide. Antagonists and inhibitorsmay include proteins, nucleic acids, carbohydrates, or any othermolecules that bind to the polypeptide.

“Biologically active”, as used herein, refers to a molecule having thestructural, regulatory, or biochemical functions of a naturallyoccurring molecule.

“Cell culture”, as used herein, refers to a proliferating mass of cellsthat may be in either an undifferentiated or differentiated state.

“Chimeric plasmid”, as used herein, refers to any recombinant plasmidformed (by cloning techniques) from nucleic acids derived from organismswhich do not normally exchange genetic information (for example,Escherichia coli and Saccharomyces cerevisiae).

“Chimeric sequence” or “chimeric gene”, as used herein, refer to anucleotide sequence derived from at least two heterologous parts. Thesequence may comprise DNA or RNA.

As used herein, the term “chromatographic data” refers to total ionchromatograms corresponding to individual biological or referencesamples. Data such as retention time, retention index, peak areas, andpeak areas normalized to internal standards can be extracted from totalion chromatograms to generate “peak tables.”

“Coding sequence”, as used herein, refers to a deoxyribonucleotidesequence which, when transcribed and translated, results in theformation of a cellular polypeptide or a ribonucleotide sequence which,when translated, results in the formation of a cellular polypeptide.

“Compatible”, as used herein, refers to the capability of operating withother components of a system. A vector or plant viral nucleic acid whichis compatible with a host is one which is capable of replicating in thathost. A coat protein which is compatible with a viral nucleotidesequence is one capable of encapsidating that viral sequence.

“Coding region”, as used herein, refers to that portion of a gene whichcodes for a protein. The term “non-coding region” refers to that portionof a gene that is not a coding region.

“Complementary” or “complementarity”, as used herein, refer to theWatson-Crick base-pairing of two nucleic acid sequences. For example,for the sequence 5′-AGT-3′ binds to the complementary sequence3′-TCA-5′. Complementarity between two nucleic acid sequences may be“partial”, in which only some of the bases bind to their complement, orit may be complete as when every base in the sequence binds to it'scomplementary base. The degree of complementarity between nucleic acidstrands has significant effects on the efficiency and strength ofhybridization between nucleic acid strands.

As used herein, the term “contig” refers to a nucleic acid sequence thatis derived from the contiguous assembly of two or more nucleic acidsequences.

“Correlates with expression of a polynucleotide”, as used herein,indicates that the detection of the presence of ribonucleic acid that issimilar to a nucleic acid (for example, SEQ ID NOs:1-571) and isindicative of the presence of mRNA encoding a polypeptide (for example,a polypeptide encoded by a nucleic acid of the present invention) in asample and thereby correlates with expression of the transcript from thepolynucleotide encoding the protein.

As used herein, the term “customized reporting” refers to themodification of a preliminary analyst report to generate a modifiedanalyst report. In some embodiments, modifications include, but are notlimited to, substitution of underivatized compound names for derivatizedcompound names and generation of a hit score.

As used herein, the term “data analysis software” refers to softwareconfigured for the analysis of spectroscopic and chromatographic datacorresponding to fractioned biological and reference samples. Dataanalysis software is configured to perform data reduction, twodimensional peak matching, quantitative peak differentiation, peakidentification, and customized reporting functions.

As used herein, the term “data reduction” refers to the process oforganizing, compiling, and normalizing data (for example,chromatographic data and spectroscopic data). In some embodiments, datareduction includes the normalization of raw chromatogram peak areas andthe generation of peak tables. In some embodiments, data reduction alsoincludes the process of filtering peaks based on their normalized area.This step removes peaks that are considered to be background.

As used herein, the term “data sorting” refers to the generation of apreliminary analyst report. In some embodiment, the preliminary analystreport includes equivalence value, retention time, retention index,normalized peak are, peak identification status, compound name, CASnumber, mass spectral library, ID number, MS-XCR value, relative %change, notes, and other information about the fractionated biologicalsample.

“Deletion”, as used herein, refers to a change made in either an aminoacid or nucleotide sequence resulting in the absence of one or moreamino acids or nucleotides, respectively.

“Encapsidation”, as used herein, refers to the process during virionassembly in which nucleic acid becomes incorporated in the viral capsidor in a head/capsid precursor (for example, in certain bacteriophages).

“Exon”, as used herein, refers to a polynucleotide sequence in a nucleicacid that encodes information for protein synthesis and that is copiedand spliced together with other such sequences to form messenger RNA.

“Expression”, as used herein, is meant to incorporate transcription,reverse transcription, and translation.

“Expressed sequence tag (EST)” as used herein, refers to relativelyshort single-pass DNA sequences obtained from one or more ends of cDNAclones and RNA derived therefrom. They may be present in either the 5′or the 3′ orientation. ESTs have been shown to be useful for identifyingparticular genes.

“Industrial crop”, as used herein, refers to crops grown primarily forconsumption by humans or animals or use in industrial processes (forexample, as a source of fatty acids for manufacturing or sugars forproducing alcohol). It will be understood that either the plant or aproduct produced from the plant (for example, sweeteners, oil, flour, ormeal) can be consumed. Examples of food crops include, but are notlimited to, corn, soybean, rice, wheat, oilseed rape, cotton, oats,barley, and potato plants.

“Foreign gene”, as used herein, refers to any sequence that is notnative to the virus.

“Fusion protein”, as used herein, refers to a protein containing aminoacid sequences from each of two distinct proteins; it is formed by theexpression of a recombinant gene in which two coding sequences have beenjoined together such that their reading frames are in phase. Hybridgenes of this type may be constructed in vitro in order to label theproduct of a particular gene with a protein which can be more readilyassayed (for example, a gene fused with lacZ in E. coli to obtain afusion protein with β-galactosidase activity). Alternatively, a proteinmay be linked to a signal peptide to allow its secretion by the cell.The products of certain viral oncogenes are fusion proteins.

As used herein, the term “fractionated biological sample” refers to abiological sample that has been fractionated into two or more fractionsbased on one or more properties of the sample. For example, in someembodiments (see, for example, Example 19), leaf extracts arefractionated based on extraction with organic solvents.

“Gene”, as used herein, refers to a discrete nucleic acid sequenceresponsible for a discrete cellular product. The term “gene”, as usedherein, refers not only to the nucleotide sequence encoding a specificprotein, but also to any adjacent 5′ and 3′ non-coding nucleotidesequence involved in the regulation of expression of the protein encodedby the gene of interest. These non-coding sequences include terminatorsequences, promoter sequences, upstream activator sequences, regulatoryprotein binding sequences, and the like. These non-coding sequence generegions may be readily identified by comparison with previouslyidentified eukaryotic non-coding sequence gene regions. Furthermore, theperson of average skill in the art of molecular biology is able toidentify the nucleotide sequences forming the non-coding regions of agene using well-known techniques such as a site-directed mutagenesis,sequential deletion, promoter probe vectors, and the like.

“Growth cycle”, as used herein, is meant to include the replication of anucleus, an organelle, a cell, or an organism.

“Heterologous”, as used herein, refers to the association of a molecularor genetic element associated with a distinctly different type ofmolecular or genetic element.

“Host”, as used herein, refers to a cell, tissue or organism capable ofreplicating a vector or plant viral nucleic acid and which is capable ofbeing infected by a virus containing the viral vector or plant viralnucleic acid. This term is intended to include prokaryotic andeukaryotic cells, organs, tissues or organisms, where appropriate.

As used herein, the term “homolog” as in a “homolog” of a given nucleicacid sequence, refers to a nucleic acid sequence (for example, a nucleicacid sequence from another organism), that shares a given degree of“homology” with the nucleic acid sequence.

The term “homology” refers to a degree of complementarity. There may bepartial homology or complete homology (identity). A partiallycomplementary sequence is one that at least partially inhibits acompletely complementary sequence from hybridizing to a target nucleicacid and is referred to using the functional term “substantiallyhomologous.” The inhibition of hybridization of the completelycomplementary sequence to the target sequence may be examined using ahybridization assay (Southern or Northern blot, solution hybridizationand the like) under conditions of low stringency. A substantiallyhomologous sequence or probe will compete for and inhibit the binding(the hybridization) of a completely homologous sequence to a targetunder conditions of low stringency. This is not to say that conditionsof low stringency are such that non-specific binding is permitted; lowstringency conditions require that the binding of two sequences to oneanother be a specific (selective) interaction. The absence ofnon-specific binding may be tested by the use of a second target thatlacks even a partial degree of complementarity (for example, less thanabout 30% identity); in the absence of non-specific binding the probewill not hybridize to the second non-complementary target.

Numerous equivalent conditions may be employed to comprise lowstringency conditions; factors such as the length and nature (DNA, RNA,base composition) of the probe and nature of the target (DNA, RNA, basecomposition, present in solution or immobilized, etc.) and theconcentration of the salts and other components (for example, thepresence or absence of form amide, dextran sulfate, polyethylene glycol)are considered and the hybridization solution may be varied to generateconditions of low stringency hybridization different from, butequivalent to, the above listed conditions. In addition, the art knowsconditions that promote hybridization under conditions of highstringency (for example, increasing the temperature of the hybridizationand/or wash steps, the use of formamide in the hybridization solution,etc.).

When used in reference to a double-stranded nucleic acid sequence suchas a cDNA or genomic clone, the term “substantially homologous” refersto any probe that can hybridize to either or both strands of thedouble-stranded nucleic acid sequence under conditions of low stringencyas described above.

A gene may produce multiple RNA species that are generated bydifferential splicing of the primary RNA transcript. cDNAs that aresplice variants of the same gene will contain regions of sequenceidentity or complete homology (representing the presence of the sameexon or portion of the same exon on both cDNAs) and regions of completenonidentity (for example, representing the presence of exon “A” on cDNA1 wherein cDNA 2 contains exon “B” instead). Because the two cDNAscontain regions of sequence identity they will both hybridize to a probederived from the entire gene or portions of the gene containingsequences found on both cDNAs; the two splice variants are thereforesubstantially homologous to such a probe and to each other.

When used in reference to a single-stranded nucleic acid sequence, theterm “substantially homologous” refers to any probe that can hybridize(it is the complement of) the single-stranded nucleic acid sequenceunder conditions of low stringency as described above.

As used herein, the term “hybridization” is used in reference to thepairing of complementary nucleic acids. Hybridization and the strengthof hybridization (for example, the strength of the association betweenthe nucleic acids) is impacted by such factors as the degree ofcomplementary between the nucleic acids, stringency of the conditionsinvolved, the T_(m) of the formed hybrid, and the G:C ratio within thenucleic acids.

“Hybridization complex”, as used herein, refers to a complex formedbetween nucleic acid strands by virtue of hydrogen bonding, stacking orother non-covalent interactions between bases. A hybridization complexmay be formed in solution or between nucleic acid sequences present insolution and nucleic acid sequences immobilized on a solid support (forexample, membranes, filters, chips, pins or glass slides to which cellshave been fixed for in situ hybridization).

“Immunologically active” refers to the capability of a natural,recombinant, or synthetic polypeptide, or any oligopeptide thereof, tobind with specific antibodies and induce a specific immune response inappropriate animals or cells.

“Induction” and the terms “induce”, “induction” and “inducible”, as usedherein, refer generally to a gene and a promoter operably linked theretowhich is in some manner dependent upon an external stimulus, such as amolecule, in order to actively transcribed and/or translate the gene.

“Infection”, as used herein, refers to the ability of a virus totransfer its nucleic acid to a host or introduce viral nucleic acid intoa host, wherein the viral nucleic acid is replicated, viral proteins aresynthesized, and new viral particles assembled. In this context, theterms “transmissible” and “infective” are used interchangeably herein.

As used herein, the term “insecticidally effective amount,” when used inreference to a polypeptide, refers to the amount of polypeptidenecessary to kill an insect or otherwise deter the feeding of an insectfrom the source which makes the polypeptide available to the insect.When an insect comes into contact with a insecticidally effective amountof a polypeptide delivered via transgenic plant expression, formulatedcompositions, sprayable protein compositions, a bait matrix or otherdelivery system, the results are typically death of the insect, or theinsects do not feed upon the source which makes the toxins available tothe insects.

“Insertion” or “addition”, as used herein, refers to the replacement oraddition of one or more nucleotides or amino acids, to a nucleotide oramino acid sequence, respectively.

“In cis”, as used herein, indicates that two sequences are positioned onthe same strand of RNA or DNA.

“In trans”, as used herein, indicates that two sequences are positionedon different strands of RNA or DNA.

“Intron”, as used herein, refers to a polynucleotide sequence in anucleic acid that does not encode information for protein synthesis andis removed before translation of messenger RNA.

“Isolated”, as used herein, refers to a polypeptide or polynucleotidemolecule separated not only from other peptides, DNAs, or RNAs,respectively, that are present in the natural source of themacromolecule but also from other macromolecules and preferably refersto a macromolecule found in the presence of (if anything) only asolvent, buffer, ion or other component normally present in a solutionof the same. “Isolated” and “purified” do not encompass either naturalmaterials in their native state or natural materials that have beenseparated into components (for example, in an acrylamide gel) but notobtained either as pure substances or as solutions.

“Kinase”, as used herein, refers to an enzyme (for example, hexokinaseand pyruvate kinase) which catalyzes the transfer of a phosphate groupfrom one substrate (commonly ATP) to another.

“Marker” or “genetic marker”, as used herein, refer to a genetic locuswhich is associated with a particular, usually readily detectable,genotype or phenotypic characteristic (for example, an antibioticresistance gene).

“Metabolome”, as used herein, indicates the complement of relatively lowmolecular weight molecules that is present in a plant, plant part, orplant sample, or in a suspension or extract thereof. Examples of suchmolecules include, but are not limited to: acids and related compounds;mono-, di-, and tri-carboxylic acids (saturated, unsaturated, aliphaticand cyclic, aryl, alkaryl); aldo-acids, keto-acids; lactone forms;gibberellins; abscisic acid; alcohols, polyols, derivatives, and relatedcompounds; ethyl alcohol, benzyl alcohol, methanol; propylene glycol,glycerol, phytol; inositol, furfuryl alcohol, menthol; aldehydes,ketones, quinones, derivatives, and related compounds; acetaldehyde,butyraldehyde, benzaldehyde, acrolein, furfural, glyoxal; acetone,butanone; anthraquinone; carbohydrates; mono-, di-, tri-saccharides;alkaloids, amines, and other bases; pyridines (including nicotinic acid,nicotinamide); pyrimidines (including cytidine, thymine); purines(including guanine, adenine, xanthines/hypoxanthines, kinetin);pyrroles; quinolines (including isoquinolines); morphinans, tropanes,cinchonans; nucleotides, oligonucleotides, derivatives, and relatedcompounds; guanosine, cytosine, adenosine, thymidine, inosine; aminoacids, oligopeptides, derivatives, and related compounds; esters;phenols and related compounds; heterocyclic compounds and derivatives;pyrroles, tetrapyrroles (corrinoids and porphines/porphyrins, w/w/ometal-ion); flavonoids; indoles; lipids (including fatty acids andtriglycerides), derivatives, and related compounds; carotenoids,phytoene; and sterols, isoprenoids including terpenes.

“Modulate”, as used herein, refers to a change or an alteration in thebiological activity of a polypeptide (for example, a polypeptide encodedby a nucleic acid of the present invention). Modulation may be anincrease or a decrease in protein activity, a change in bindingcharacteristics, or any other change in the biological, functional orimmunological properties of the polypeptide.

“Movement protein”, as used herein, refers to a noncapsid proteinrequired for cell to cell movement of replicons or viruses in plants.

“Multigene family”, as used herein, refers to a set of genes descendedby duplication and variation from some ancestral gene. Such genes may beclustered together on the same chromosome or dispersed on differentchromosomes. Examples of multigene families include those which encodethe histones, hemoglobins, immunoglobulins, histocompatibility antigens,actins, tubulins, keratins, collagens, heat shock proteins, salivaryglue proteins, chorion proteins, cuticle proteins, yolk proteins, andphaseolins.

“Non-native”, as used herein, refers to any RNA sequence that promotesproduction of subgenomic mRNA including, but not limited to, 1) plantviral promoters such as ORSV and brome mosaic virus, 2) viral promotersfrom other organisms such as human Sindbis viral promoter, and 3)synthetic promoters.

“Nucleic acid sequence”, as used herein, refers to a polymer ofnucleotides in which the 3′ position of one nucleotide sugar is linkedto the 5′ position of the next by a phosphodiester bridge. In a linearnucleic acid strand, one end typically has a free 5′ phosphate group,the other a free 3′ hydroxyl group. Nucleic acid sequences may be usedherein to refer to oligonucleotides, or polynucleotides, and fragmentsor portions thereof, and to DNA or RNA of genomic or synthetic originwhich may be single- or double-stranded, and represent the sense orantisense strand.

“Polypeptide”, as used herein, refers to an amino acid sequence obtainedfrom any species and from any source whether natural, synthetic,semi-synthetic, or recombinant.

“Oil-producing species,” as used herein, refers to plant species whichproduce and store triacylglycerol in specific organs, primarily inseeds. Such species include soybean (Glycine max), rapeseed and canola(including Brassica napus and B. campestris), sunflower (Helianthusannus), cotton (Gossypium hirsutum), corn (Zea mays), cocoa (Theobromacacao), safflower (Carthamus tinctorius), oil palm (Elaeis guineensis),coconut palm (Cocos nucifera), flax (Linum usitatissimum), castor(Ricinus communis) and peanut (Arachis hypogaea). The group alsoincludes non-agronomic species which are useful in developingappropriate expression vectors such as tobacco, rapid cycling Brassicaspecies, and Arabidopsis thaliana, and wild species which may be asource of unique fatty acids.

“Operably linked” refers to a juxtaposition of components, particularlynucleotide sequences, such that the normal function of the componentscan be performed. Thus, a coding sequence that is operably linked toregulatory sequences refers to a configuration of nucleotide sequenceswherein the coding sequences can be expressed under the regulatorycontrol, that is, transcriptional and/or translational control, of theregulatory sequences.

“Origin of assembly”, as used herein, refers to a sequence whereself-assembly of the viral RNA and the viral capsid protein initiates toform virions.

As used herein, the term “ortholog” refers to genes that have evolvedfrom an ancestral locus.

“Outlier peak”, as used herein, indicates a peak of a chromatogram of atest sample, or the relative or absolute detected response data, oramount or concentration data thereof. An outlier peak: 1) may have asignificantly different peak height or area as compared to a likechromatogram of a control sample; or 2) be an additional or missing peakas compared to a like chromatogram of a control sample.

As used herein, the term “overexpression” refers to the production of agene product in transgenic organisms that exceeds levels of productionin normal or non-transformed organisms.

As used herein, the term “cosuppression” refers to the expression of aforeign gene which has substantial homology to an endogenous generesulting in the suppression of expression of both the foreign and theendogenous gene. As used herein, the term “altered levels” refers to theproduction of gene product(s) in transgenic organisms in amounts orportions that differ from that of normal or non-transformed organisms.

As used herein, the term “peak identification” refers to theidentification of a chemical compound corresponding to a given peak. Insome embodiments, peaks are identified by searching mass spectrallibraries. In other embodiments, peaks are identified by searchingadditional libraries or databases (for example, biotechnologydatabases).

As used herein, the term “pesticidal activity” refers to a peptidesfunction as orally active insect control agents, a toxic effect againstpests or insects, or the ability to disrupt or deter insect feedingwhich may or may not cause death of the insect.

“Phenotype” or “phenotypic trait(s)”, as used herein, refers to anobservable property or set of properties resulting from the expressionof a gene. “Visual phenotype”, as used herein, refers to a plantdisplaying a symptom or group of symptoms that meet defined criteria.“Stunting phenotype”, as used herein, refers to a phenotype where anystunting symptoms are present in any plant region. Stunting symptomsinclude reduced internodal length, reduced petiole length, reduced shootapex length and reduced leaf blade diameter (along two axes). Othersymptoms that are typically viral such as mild (level 2 severity code)chlorosis and blade curling may be present as well. If any additionalsymptoms such as necrosis, wilting or etching are present (excluding theinoculated leaves) at any level the plant does not fit the criteria fora stunting phenotype. “Altered metabolism phenotype” as used herein,refers to a phenotype wherein the production of a given metabolite isaltered (for example, increased or decreased) in a plant. Examples ofmetabolites which can be altered in a plant include, but are not limitedto, acids, fatty acids, amino acids, hydroxy fatty acids, branched fattyacids, carbohydrates, hydrocarbons, glycerides, phenols, strerols,oxygenated terpenes, and other isoprenoids, alcohols, alkenes andalkynes.

“Plant”, as used herein, refers to any plant and progeny thereof. Theterm also includes parts of plants, including seed, cuttings, tubers,fruit, flowers, etc.

“Plant cell”, as used herein, refers to the structural and physiologicalunit of plants, consisting of a protoplast and the cell wall.

“Plant organ”, as used herein, refers to a distinct and visiblydifferentiated part of a plant, such as root, stem, leaf or embryo.

“Plant tissue”, as used herein, refers to any tissue of a plant inplanta or in culture. This term is intended to include a whole plant,plant cell, plant organ, protoplast, cell culture, or any group of plantcells organized into a structural and functional unit.

“Portion”, as used herein, with regard to a protein (“a portion of agiven protein”) refers to fragments of that protein. The fragments mayrange in size from four amino acid residues to the entire amino acidsequence minus one amino acid (10 nucleotides, 20, 30, 40, 50, 100, 200,etc.). A “portion” is preferably at least 25 nucleotides, morepreferably at least 50 nucleotides, and even more preferably at least100 nucleotides.

“Positive-sense inhibition”, as used herein, refers to a type of generegulation based on cytoplasmic inhibition of gene expression due to thepresence in a cell of an RNA molecule substantially homologous to atleast a portion of the mRNA being translated.

“Production cell”, as used herein, refers to a cell, tissue or organismcapable of replicating a vector or a viral vector, but which is notnecessarily a host to the virus. This term is intended to includeprokaryotic and eukaryotic cells, organs, tissues or organisms, such asbacteria, yeast, fungus, and plant tissue.

“Promoter”, as used herein, refers to the 5′-flanking, non-codingsequence adjacent a coding sequence which is involved in the initiationof transcription of the coding sequence.

“Protoplast”, as used herein, refers to an isolated plant cell withoutcell walls, having the potency for regeneration into cell culture or awhole plant.

“Purified”, as used herein, when referring to a peptide or nucleotidesequence, indicates that the molecule is present in the substantialabsence of other biological macromolecular, for example, polypeptides,polynucleic acids, and the like of the same type. The term “purified” asused herein preferably means at least 95% by weight, more preferably atleast 99.8% by weight, of biological macromolecules of the same typepresent (but water, buffers, and other small molecules, especiallymolecules having a molecular weight of less than 1000 can be present).

The term “pure”, as used herein, preferably has the same numericallimits as “purified” immediately above. “Substantially purified”, asused herein, refers to nucleic or amino acid sequences that are removedfrom their natural environment, isolated or separated, and are at least60% free, preferably 75% free, and most preferably 90% free from othercomponents with which they are naturally associated.

As used herein, the term “quantitative peak differentiation” refers tothe process of confirming peak matched by calculating their relativequantitative differentiation, which is expressed as a percent change ofthe sample leak are relative to the area of the reference peak. Apredetermined threshold of for the change is used to confirm that thepeaks are a match.

“Recombinant plant viral nucleic acid”, as used herein, refers to aplant viral nucleic acid which has been modified to contain non-nativenucleic acid sequences. These normative nucleic acid sequences may befrom any organism or purely synthetic, however, they may also includenucleic acid sequences naturally occurring in the organism into whichthe recombinant plant viral nucleic acid is to be introduced.

“Recombinant plant virus”, as used herein, refers to a plant viruscontaining a recombinant plant viral nucleic acid.

As used herein, the term “reference sample” refers to a samplecontaining a plurality of known biological macromolecules.

“Regulatory region” or “regulatory sequence”, as used herein, inreference to a specific gene refers to the non-coding nucleotidesequences within that gene that are necessary or sufficient to providefor the regulated expression of the coding region of a gene. Thus theterm regulatory region includes promoter sequences, regulatory proteinbinding sites, upstream activator sequences, and the like. Specificnucleotides within a regulatory region may serve multiple functions. Forexample, a specific nucleotide may be part of a promoter and participatein the binding of a transcriptional activator protein.

“Replication origin”, as used herein, refers to the minimal terminalsequences in linear viruses that are necessary for viral replication.

“Replicon”, as used herein, refers to an arrangement of RNA sequencesgenerated by transcription of a transgene that is integrated into thehost DNA that is capable of replication in the presence of a helpervirus. A replicon may require sequences in addition to the replicationorigins for efficient replication and stability.

As used herein, the term “resistance to insects,” when used in referenceto plants, refers to the ability of a plant to resist insects or otherplants.

“Sample”, as used herein, is used in its broadest sense. A biologicalsample suspected of containing nucleic acid encoding a polypeptide (forexample, a polypeptide encoded by a nucleic acid of the presentinvention) or fragments thereof may comprise a tissue, a cell, anextract from cells, chromosomes isolated from a cell (for example, aspread of metaphase chromosomes), genomic DNA (in solution or bound to asolid support such as for Southern analysis), RNA (in solution or boundto a solid support such as for northern analysis), cDNA (in solution orbound to a solid support), and the like.

“Silent mutation”, as used herein, refers to a mutation which has noapparent effect on the phenotype of the organism.

“Site-directed mutagenesis”, as used herein, refers to the in-vitroinduction of mutagenesis at a specific site in a given target nucleicacid molecule.

“Subgenomic promoter”, as used herein, refers to a promoter of asubgenomic mRNA of a viral nucleic acid.

“Specific binding” or “specifically binding”, as used herein, inreference to the interaction of an antibody and a protein or peptide,mean that the interaction is dependent upon the presence of a particularstructure (the antigenic determinant or epitope) on the protein; inother words, the antibody is recognizing and binding to a specificprotein structure rather than to proteins in general.

As used herein, the term “T_(m)” is used in reference to the “meltingtemperature.” The melting temperature is the temperature at which apopulation of double-stranded nucleic acid molecules becomes halfdissociated into single strands. The equation for calculating the T_(m)of nucleic acids is well known in the art. As indicated by standardreferences, a simple estimate of the T_(m) value may be calculated bythe equation: T_(m)=81.5+0.41(% G+C), when a nucleic acid is in aqueoussolution at 1 M NaCl (See for example, Anderson and Young, QuantitativeFilter Hybridization, in Nucleic Acid Hybridization [1985]). Otherreferences include more sophisticated computations that take structuralas well as sequence characteristics into account for the calculation ofT_(m).

As used herein, the term “stringency” is used in reference to theconditions of temperature, ionic strength, and the presence of othercompounds such as organic solvents, under which nucleic acidhybridizations are conducted. Those skilled in the art will recognizethat “stringency” conditions may be altered by varying the parametersjust described either individually or in concert. With “high stringency”conditions, nucleic acid base pairing will occur only between nucleicacid fragments that have a high frequency of complementary basesequences (for example, hybridization under “high stringency” conditionsmay occur between homologs with about 85-100% identity, preferably about70100% identity). With medium stringency conditions, nucleic acid basepairing will occur between nucleic acids with an intermediate frequencyof complementary base sequences (for example, hybridization under“medium stringency” conditions may occur between homologs with about50-70% identity). Thus, conditions of “weak” or “low” stringency areoften required with nucleic acids that are derived from organisms thatare genetically diverse, as the frequency of complementary sequences isusually less.

“High stringency conditions” when used in reference to nucleic acidhybridization comprise conditions equivalent to binding or hybridizationat 42° C. in a solution consisting of 5×SSPE (43.8 g/l NaCl, 6.9 g/lNaH₂PO₄H₂O and 1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS,5× Denhardt's reagent and 100 μg/ml denatured salmon sperm DNA followedby washing in a solution comprising 0.1×SSPE, 1.0% SDS at 42° C. when aprobe of about 500 nucleotides in length is employed.

“Medium stringency conditions” when used in reference to nucleic acidhybridization comprise conditions equivalent to binding or hybridizationat 42° C. in a solution consisting of 5×SSPE (43.8 g/l NaCl, 6.9 g/lNaH₂PO₄H₂O and 1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS,5× Denhardt's reagent and 100 μg/ml denatured salmon sperm DNA followedby washing in a solution comprising 1.0×SSPE, 1.0% SDS at 42° C. when aprobe of about 500 nucleotides in length is employed.

“Low stringency conditions” comprise conditions equivalent to binding orhybridization at 42° C. in a solution consisting of 5×SSPE (43.8 g/lNaCl, 6.9 g/l NaH₂PO₄H₂O and 1.85 g/l EDTA, pH adjusted to 7.4 withNaOH), 0.1% SDS, 5× Denhardt's reagent [50× Denhardt's contains per 500ml: 5 g Ficoll (Type 400, Pharamcia), 5 g BSA (Fraction V; Sigma)] and100 g/ml denatured salmon sperm DNA followed by washing in a solutioncomprising 5×SSPE, 0.1% SDS at 42° C. when a probe of about 500nucleotides in length is employed.

“Substitution”, as used herein, refers to a change made in an amino acidof nucleotide sequence which results in the replacement of one or moreamino acids or nucleotides by different amino acids or nucleotides,respectively.

As used herein, the term “susceptibility to insects,” when used inreference to plants, refers the extent that a plant is subject to damageby insects or other plants.

“Symptom”, as used herein refers to a visual condition resulting fromthe action of the GENEWARE™ vector or the clone insert.

“Systemic infection”, as used herein, denotes infection throughout asubstantial part of an organism including mechanisms of spread otherthan mere direct cell inoculation but rather including transport fromone infected cell to additional cells either nearby or distant.

As used herein, the term “tolerance to insects,” when used in referenceto plants, refers to the ability of a plant to withstand damage causedby pests or insects.

“Transcription”, as used herein, refers to the production of an RNAmolecule by RNA polymerase as a complementary copy of a DNA sequence.

“Transcription termination region”, as used herein, refers to thesequence that controls formation of the 3′ end of the transcript.Self-cleaving ribozymes and polyadenylation sequences are examples oftranscription termination sequences.

“Transformation”, as used herein, describes a process by which exogenousDNA enters and changes a recipient cell. It may occur under natural orartificial conditions using various methods well known in the art.Transformation may rely on any known method for the insertion of foreignnucleic acid sequences into a prokaryotic or eukaryotic host cell. Themethod is selected based on the host cell being transformed and mayinclude, but is not limited to, viral infection, electroporation,lipofection, and particle bombardment. Such “transformed” cells includestably transformed cells in which the inserted DNA is capable ofreplication either as an autonomously replicating plasmid or as part ofthe host chromosome. They also include cells which transiently expressthe inserted DNA or RNA for limited periods of time.

The term “transfection”, as used herein, refers to the introduction offoreign DNA into eukaryotic cells. Transfection may be accomplished by avariety of means known to the art including calcium phosphate-DNAco-precipitation, DEAE-dextran-mediated transfection, polybrene-mediatedtransfection, electroporation, microinjection, liposome fusion,lipofection, protoplast fusion, retroviral infection, and biolistics.

“Transposon”, as used herein, refers to a nucleotide sequence such as aDNA or RNA sequence which is capable of transferring location or movingwithin a gene, a chromosome or a genome.

“Transgenic plant”, as used herein, refers to a plant which contains aforeign nucleotide sequence inserted into either its nuclear genome ororganellar genome.

“Transgene”, as used herein, refers to the DNA sequence coding for thereplicon that is inserted into the host DNA.

As used herein, the term “two-dimensional peak matching” refers to thepairing or matching of peaks in reference and fractionated biologicalsamples. Peaks are first paired based on their retention index. A matchis then confirmed by spectral matching.

“Variants” of a polypeptide (for example, a polypeptide encoded by anucleic acid of the present invention), as used herein, refers to asequence resulting when a polypeptide is altered by one or more aminoacids. The variant may have “conservative” changes, wherein asubstituted amino acid has similar structural or chemical properties,for example, replacement of leucine with isoleucine. More rarely, avariant may have “nonconservative” changes, for example, replacement ofa glycine with a tryptophan. Variants may also include sequences withamino acid deletions or insertions, or both. Guidance in determiningwhich amino acid residues may be substituted, inserted, or deletedwithout abolishing biological or immunological activity may be foundusing computer programs well known in the art.

“Vector”, as used herein, refers to a self-replicating DNA or RNAmolecule which transfers a nucleic acid segment between cells.

“Virion”, as used herein, refers to a particle composed of viral RNA andviral capsid protein.

“Virus”, as used herein, refers to an infectious agent composed of anucleic acid encapsidated in a protein. A virus may be a mono-, di-,tri- or multipartite virus.

DESCRIPTION OF THE INVENTION

A number of chemical compounds have been identified by others that causedwarfing or stunting of plants (for example, U.S. Pat. Nos. 4,045,459;3,931,235; 3,947,264; and 3,818,046). However, the use of many of thesechemical compounds in environmental settings is limited by theirpotential toxicity. Use of the nucleic acids and constructs describedherein to dwarf or stunt plants avoids the problems associated with therelease of toxic chemicals into the environment.

Accordingly, the present invention provides nucleic acid sequences that,when expressed in plants, cause stunting of plant growth. The presentinvention is not limited to a particular mechanism of action. Indeed, anunderstanding of the mechanism of action is necessary to practice thepresent invention. However, it is contemplated that the stuntingphenotype is caused by either overexpression, antisense inhibition, orcosuppression mediated by expression of the nucleic sequence.

Genes that are demonstrated to effect growth regulation of the plant(stunting, elongation, etc) are useful for a number of purposes,including, but not limited to the following: a) Creation of dwarfvarieties of any plant species; b) Creation of plants that havecontrolled meristematic growth such that a desired plant height or plantform is achieved; c) Creation of plants that have a lengthenedvegetative phase of plant development to achieve increased plant massand yield; d) Creation of plants that have a shortened vegetative phaseof plant development to achieve yields in a short growing season; and e)Creation of plants that undergo senescence or programmed death at adesired time.

I. Identification of Nucleotide and Amino Acid Sequences

The invention is based on the discovery of putative known and unknowndeoxyribose nucleic acid (DNA) and amino acid sequences identified inone or more metabolic pathways that lead to dwarfism and stunting inplants and the use of these sequences in agriculture to create dwarfvarieties of any plant species.

Nucleic acids encoding the polypeptides of the present invention werefirst identified in Biosource clones generated from an ABRC cDNAlibrary. The cDNA library had been constructed in the GENEWARE vector.The GENEWARE vector is described in U.S. application Ser. No.09/008,186. Each of the complete set of clones from the GENEWARE librarywere used to prepare an infectious viral unit. An infectious unitcorresponding to each clone was used to inoculate Nicotiana benthamiana(a dicotyledonous plant). The plants were grown under identicalconditions and a phenotypic analysis of each plant was carried out. Thestunting and dwarfing phenotype was observed in the plants that had beeninfected by infectious unit created from the nucleic acids of thepresent invention. In other embodiments, sequences causing an alteredmetabolism phenotype were identified.

Following the identification of the stunting phenotype in plant samples,further biochemical analyses of the infected plant s tissue were carriedout. Function was ascertained by a determination of at least onevariation produced in the metabolome of the infected plant using theanalytical methodologies and data processing techniques described in theExamples section below. The nucleotide sequences of the presentinvention were analyzed using bioinformatics methods as described below.

II. Bioinformatics Methods

A. Phred, Phrap and Consed

Phred, Phrap and Consed are a set of programs which read DNA sequencertraces, make base calls, assemble the shotgun DNA sequence data andanalyze the sequence regions that are likely to contribute to errors.Phred is the initial program used to read the sequencer trace data, callthe bases and assign quality values to the bases. Phred uses aFourier-based method to examine the base traces generated by thesequencer. The output files from Phred are written in FASTA, phd or scfformat. Phrap is used to assemble contiguous sequences from only thehighest quality portion of the sequence data output by Phred. Phrap isamenable to high-throughput data collection. Finally, Consed is used asa finishing tool to assign error probabilities to the sequence data.Detailed description of the Phred, Phrap and Consed software and its usecan be found in the following references: Ewing et al., Genome Res.,8:175 [1998]; Ewing and Green, Genome Res. 8:186 [1998]; Gordon et al.,Genome Res. 8: 195 [1998].

B. BLAST

The BLAST set of programs may be used to compare the large numbers ofsequences and obtain homologies to known protein families. Thesehomologies provide information regarding the function of newly sequencedgenes. Detailed description of the BLAST software and its uses can befound in the following references Altschul et al., J. Mol. Biol.,215:403 [1990]; Altschul, J. Mol. Biol. 219:555 [1991].

Generally, BLAST performs sequence similarity searching and is dividedinto 5 basic subroutines: (1) BLASTP compares an amino acid sequence toa protein sequence database; (2) BLASTN compares a nucleotide sequenceto a nucleic acid sequence database; (3) BLASTX compares translatedprotein sequences done in 6 frames to a protein sequence database; (4)TBLASTN compares a protein sequence to a nucleotide sequence databasethat is translated into all 6 reading frames; (5) TBLASTX compares the 6frame translated protein sequence to the 6-frame translation of anucleotide sequence database. Subroutines (3)-(5) may be used toidentify weak similarities in nucleic acid sequence.

The BLAST program is based on the High Segment Pair (HSP), two sequencefragments of arbitrary but equal length whose alignment is locallymaximized and whose alignment meets or exceeds a cutoff threshold. BLASTdetermines multiple HSP sets statistically using sum statistics. Thescore of the HSP is then related to its expected chance of frequency ofoccurrence, E. The value, E, is dependent on several factors such as thescoring system, residue composition of sequences, length of querysequence and total length of database. In the output file will be listedthese E values, these are typically in a histogram format, and areuseful in determining levels of statistical significance at the user spredefined expectation threshold. Finally, the Smallest Sum Probability,P(N) is the probability of observing the shown matched sequences bychance alone and is typically in the range of 0-1.

BLAST measures sequence similarity using a matrix of similarity scoresfor all possible pairs of residues and these specify scores for aligningpairs of amino acids. The matrix of choice for a specific use depends onseveral factors: the length of the query sequence and whether or not aclose or distant relationship between sequences is suspected. Severalmatrices are available including PAM40, PAM120, PAM250, BLOSUM 62 andBLOSUM 50. Altschul et al. (1990) found PAM120 to be the most broadlysensitive matrix (for example point accepted mutation matrix per 100residues). However, in some cases the PAM120 matrix may not find shortbut strong or long but weak similarities between sequences. In thesecases, pairs of PAM matrices may be used, such as PAM40 and PAM 250, andthe results compared. Typically, PAM 40 is used for database searchingwith a query of 9-21 residues long, while PAM 250 is used for lengths of47-123.

The BLOSUM (Blocks Substitution Matrix) series of matrices areconstructed based on percent identity between two sequence segments ofinterest. Thus, the BLOSUM62 matrix is based on a matrix of sequencesegments in which the members are less than 62% identical. BLOSUM62shows very good performance for BLAST searching. However, other BLOSUMmatrices, like the PAM matrices, may be useful in other applications.For example, BLOSUM45 is particularly strong in profile searching.

C. FASTA

The FASTA suite of programs permits the evaluation of DNA and proteinsimilarity based on local sequence alignment. The FASTA search algorithmutilizes Smith/Waterman- and Needleman/Wunsch-based optimizationmethods. These algorithms consider all of the alignment possibilitiesbetween the query sequence and the library in the highest-scoringsequence regions. The search algorithm proceeds in four basic steps:

-   1. The identities or pairs of identities between the two DNA or    protein sequences are determined. The ktup parameter, as set by the    user, is operative and determines how many consecutive sequence    identities are required to indicate a match.-   2. The regions identified in step 1 are re-scored using a PAM or    BLOSUM matrix. This allows conservative replacements and runs of    identities shorter than that specified by ktup to contribute to the    similarity score.-   3. The region with the single best scoring initial region is used to    characterize pairwise similarity and these scores are used to rank    the library sequences.-   4. The highest scoring library sequences are aligned using the    Smith-Waterman algorithm.    This final comparison takes into account the possible alignments of    the query and library sequence in the highest scoring region.

Further detailed description of the FASTA software and its use can befound in the following reference: Pearson and Lipman, Proc. Natl. Acad.Sci., 85: 2444 [1988].

D. Pfam

Despite the large number of different protein sequences determinedthrough genomics-based approaches, relatively few structural andfunctional domains are known. Pfam is a computational method thatutilizes a collection of multiple alignments and profile hidden Markovmodels of protein domain families to classify existing and newly foundprotein sequences into structural families. Detailed description of thePfam software and its uses can be found in the following references:Sonhammer et al., Proteins: Structure, Function and Genetics, 28:405[1997]; Sonhammer et al., Nucleic Acids Res., 26:320 [1998]; Bateman etal., Nucleic Acids Res., 27: 260 [1999].

Pfam 3.1, the latest version, includes 54% of proteins in SWISS_PROT andSP-TrEMBL-5 as a match to the database and includes expectation valuesfor matches. Pfam consists of parts A and B. Pfam-A contains a hiddenMarkov model and includes curated families. Pfam-B uses the Domainerprogram to cluster sequence segments not included in Pfam-A. Domaineruses pairwise homology data from Blastp to construct aligned families.

Alternative protein family databases that may be used include PRINTS andBLOCKS, which both are based on a set of ungapped blocks of alignedresidues. However, these programs typically contain short conservedregions whereas Pfam represents a library of complete domains thatfacilitates automated annotation. Comparisons of Pfam profiles may alsobe performed using genomic and EST data with the programs, Genewise andESTwise, respectively. Both of these programs allow for introns andframe shifting errors.

E. BLOCKS

The determination of sequence relationships between unknown sequencesand those that have been categorized can be problematic becausebackground noise increases with the number of sequences, especially at alow level of similarity detection. One recent approach to this problemhas been tested that efficiently detects and confirms weak or distantrelationships among protein sequences based on a database of blocks. TheBLOCKS database provides multiple alignments of sequences and containsblocks or protein motifs found in known families of proteins.

Other programs such as PRINTS and Prodom also provide alignments,however, the BLOCKS database differs in the manner in which the databasewas constructed. Construction of the BLOCKS database proceeds asfollows: one starts with a group of sequences that presumably have oneor motifs in common, such as those from the PROSITE database. ThePROTOMAT program then uses a motif finding program to scan sequences forsimilarity looking for spaced triplets of amino acids. The locatedblocks are then entered into the MOTOMAT program for block assembly.Weights are computed for all sequences. Following construction of aBLOCKS database one can use BLIMPS to performs searches of the BLOCKSdatabase. Detailed description of the construction and use of a BLOCKSdatabase can be found in the following references: Henikoff, S. andHenikoff, J. G., Genomics, 19:97 [1994]; Henikoff, J. G. and Henikoff,S., Meth. Enz., 266:88 [1996].

F. PRINTS

The PRINTS database of protein family fingerprints can be used inaddition to BLOCKS and PROSITE. These databases are considered to besecondary databases because they diagnose the relationship betweensequences that yield function information. Presently, however, it is notrecommended that these databases be used alone. Rather, it is stronglysuggested that these pattern databases be used in conjunction with eachother so that a direct comparison of results can be made to analyzetheir robustness.

Generally, these programs utilize pattern recognition to discover motifswithin protein sequences. However, PRINTS goes one step further, ittakes into account not simply single motifs but several motifssimultaneously that might characterize a family signature. Otherprograms, such as PROSITE, rely on pattern recognition but are limitedby the fact that query sequences must match them exactly. Thus,sequences that vary slightly will be missed. In contrast, the PRINTSdatabase fingerprinting approach is capable of identifying distantrelatives due to its reliance on the fact that sequences do not havematch the query exactly. Instead they are scored according to how wellthey fit each motif in the signature. Another advantage of PRINTS isthat it allows the user to search both PRINTS and PROSITEsimultaneously. A detailed description of the use of PRINTS can be foundin the following reference: Attwood et al., Nucleic Acids Res. 25: 212[1997].

III. Nucleic Acid Sequences, Including Related, Variant, Altered andExtended Sequences

The invention encompasses nucleic acids, polypeptides encoded by thenucleic acid sequences, and polypeptide (for example, a polypeptideencoded by a nucleic acid of the present invention) variants that retainthe biological or other functional activity of the polypeptide ofinterest. A preferred polypeptide variant is one having at least 80%,and more preferably 90%, amino acid sequence identity to the amino acidsequence of interest. A most preferred polypeptide variant is one havingat least 95% amino acid sequence identity to the polypeptide ofinterest.

In particularly preferred embodiments, the invention encompasses thepolynucleotides comprising SEQ ID NOs:1-571. In particularly preferredembodiments, the nucleic acids are operably linked to an exogenouspromoter (and in most preferred embodiments to a plant promoter) orpresent in a vector.

It will be appreciated by those skilled in the art that as a result ofthe degeneracy of the genetic code, a multitude of nucleotide sequencesencoding a given polypeptide (for example, a polypeptide encoded by anucleic acid of the present invention), some bearing minimal homology tothe nucleotide sequences of any known and naturally occurring gene, maybe produced. Thus, the invention contemplates each and every possiblevariation of nucleotide sequence that could be made by selectingcombinations based on possible codon choices. These combinations aremade in accordance with the standard triplet genetic code as applied tothe nucleotide sequence of the naturally occurring polypeptide, and allsuch variations are to be considered as being specifically disclosed.

Although nucleotide sequences which encode a given polypeptide (forexample, a polypeptide encoded by a nucleic acid of the presentinvention) and its variants are preferably capable of hybridizing to thenucleotide sequence of the naturally occurring polypeptide underappropriately selected conditions of stringency, it may be advantageousto produce nucleotide sequences encoding the polypeptide or itsderivatives possessing a substantially different codon usage. Codons maybe selected to increase the rate at which expression of the peptideoccurs in a particular prokaryotic or eukaryotic host in accordance withthe frequency with which particular codons are utilized by the host.Other reasons for substantially altering the nucleotide sequenceencoding a polypeptide and its derivatives without altering the encodedamino acid sequences include the production of RNA transcripts havingmore desirable properties, such as a greater half-life, than transcriptsproduced from the naturally occurring sequence.

The invention also encompasses production of DNA sequences, or portionsthereof, which encode a polypeptide and its derivatives, entirely bysynthetic chemistry. After production, the synthetic sequence may beinserted into any of the many available expression vectors and cellsystems using reagents that are well known in the art. Moreover,synthetic chemistry may be used to introduce mutations into a sequenceencoding a polypeptide (for example, a polypeptide encoded by a nucleicacid of the present invention) or any portion thereof.

Also encompassed by the invention are polynucleotide sequences that arecapable of hybridizing to SEQ ID NOs:1-571 under various conditions ofstringency (for example, low to high stringency). Hybridizationconditions are based on the melting temperature T_(m) of the nucleicacid binding complex or probe, as taught in Wahl and Berger, MethodsEnzymol., 152:399 [19871 and Kimmel Methods Enzymol., 152:507 [1987],and may be used at a defined stringency.

Altered nucleic acid sequences encoding a polypeptide include deletions,insertions, or substitutions of different nucleotides resulting in apolynucleotide that encodes the same or a functionally equivalentpolypeptide. The encoded protein may also contain deletions, insertions,or substitutions of amino acid residues which produce a silent changeand result in a functionally equivalent polypeptide. Deliberate aminoacid substitutions may be made on the basis of similarity in polarity,charge, solubility, hydrophobicity, hydrophilicity, and/or theamphipathic nature of the residues as long as the biological activity ofthe polypeptide is retained. For example, negatively charged amino acidsmay include aspartic acid and glutamic acid; positively charged aminoacids may include lysine and arginine; and amino acids with unchargedpolar head groups having similar hydrophilicity values may includeleucine, isoleucine, and valine; glycine and alanine; asparagine andglutamine; serine and threonine; phenylalanine and tyrosine.

Also included within the scope of the present invention are alleles ofthe genes encoding polypeptides. As used herein, an “allele” or “allelicsequence” is an alternative form of the gene which may result from atleast one mutation in the nucleic acid sequence. Alleles may result inaltered mRNAs or polypeptides whose structure or function may or may notbe altered. Any given gene may have none, one, or many allelic forms.Common mutational changes which give rise to alleles are generallyascribed to natural deletions, additions, or substitutions ofnucleotides. Each of these types of changes may occur alone, or incombination with the others, one or more times in a given sequence.

Methods for DNA sequencing which are well known and generally availablein the art may be used to practice any embodiments of the invention. Themethods may employ such enzymes as the Klenow fragment of DNA polymeraseI, SEQUENASE (US Biochemical Corporation, Cleveland, Ohio), TAQpolymerase (U.S. Biochemical Corporation, Cleveland, Ohio), thermostableT7 polymerase (Amersham Pharmacia Biotech, Chicago, Ill.), orcombinations of recombinant polymerases and proofreading exonucleasessuch as the ELONGASE amplification system (Life Technologies, Rockville,Md.). Preferably, the process is automated with machines such as theMICROLAB 2200 (Hamilton Company, Reno, Nev.), PTC200 DNA Engine thermalcycler (MJ Research, Watertown, Mass.) and the ABI 377 DNA sequencer(Perkin Elmer).

The nucleic acid sequences encoding a polypeptide (for example, apolypeptide encoded by a nucleic acid of the present invention) may beextended utilizing a partial nucleotide sequence and employing variousmethods known in the art to detect upstream sequences such as promotersand regulatory elements. For example, one method which may be employed,“restriction-site” PCR, uses universal primers to retrieve unknownsequence adjacent to a known locus (Sarkar, PCR Methods Applic. 2:31S[1993]). In particular, genomic DNA is first amplified in the presenceof primer to linker sequence and a primer specific to the known region.The amplified sequences are then subjected to a second round of PCR withthe same linker primer and another specific primer internal to the firstone. Products of each round of PCR are transcribed with an appropriateRNA polymerase and sequenced using reverse transcriptase.

Inverse PCR may also be used to amplify or extend sequences usingdivergent primers based on a known region (Triglia et al., Nucleic AcidsRes. 16:8186 [1988]). The primers may be designed using OLIGO 4.06primer analysis software (National Biosciences Inc., Plymouth, Minn.),or another appropriate program, to be 22-30 nucleotides in length, tohave a GC content of 50% or more, and to anneal to the target sequenceat temperatures about 68-72 C. The method uses several restrictionenzymes to generate a suitable fragment in the known region of a gene.The fragment is then circularized by intramolecular ligation and used asa PCR template.

Another method which may be used is capture PCR which involves PCRamplification of DNA fragments adjacent to a known sequence in human andyeast artificial chromosome DNA (Lagerstrom et al., PCR Methods Applic.1:111 [1991]). In this method, multiple restriction enzyme digestionsand ligations may also be used to place an engineered double-strandedsequence into an unknown portion of the DNA molecule before performingPCR.

Another method which may be used to retrieve unknown sequences is thatof Parker et al., Nucleic Acids Res., 19:3055 [1991]. Additionally, onemay use PCR, nested primers, and PROMOTERFINDER DNA Walking Kitslibraries (Clontech, Palo Alto, Calif.) to walk in genomic DNA. Thisprocess avoids the need to screen libraries and is useful in findingintron/exon junctions.

When screening for full-length cDNAs, it is preferable to use librariesthat have been size-selected to include larger cDNAs. Also,random-primed libraries are preferable, in that they will contain moresequences which contain the 5′ regions of genes. Use of a randomlyprimed library may be especially preferable for situations in which anoligo d(T) library does not yield a full-length cDNA. Genomic librariesmay be useful for extension of sequence into the 5′ and 3′non-transcribed regulatory regions.

Capillary electrophoresis systems which are commercially available (forexample, from PE Biosystems, Inc., Foster City, Calif.) may be used toanalyze the size or confirm the nucleotide sequence of sequencing or PCRproducts. In particular, capillary sequencing may employ flowablepolymers for electrophoretic separation, four different fluorescent dyes(one for each nucleotide) which are laser activated, and detection ofthe emitted wavelengths by a charge coupled device camera. Output/lightintensity may be converted to electrical signal using appropriatesoftware (for example, GENOTYPER and SEQUENCE NAVIGATOR from PEBiosystems, Foster City, Calif.) and the entire process from loading ofsamples to computer analysis and electronic data display may be computercontrolled. Capillary electrophoresis is especially preferable for thesequencing of small pieces of DNA which might be present in limitedamounts in a particular sample.

It is contemplated that the nucleic acids disclosed herein can beutilized as starting nucleic acids for directed evolution. In someembodiments, artificial evolution is performed by random mutagenesis(for example, by utilizing error-prone PCR to introduce random mutationsinto a given coding sequence). This method requires that the frequencyof mutation be finely tuned. As a general rule, beneficial mutations arerare, while deleterious mutations are common. This is because thecombination of a deleterious mutation and a beneficial mutation oftenresults in an inactive enzyme. The ideal number of base substitutionsfor targeted gene is usually between 1.5 and 5 (Moore and Arnold, Nat.Biotech., 14, 458-67 [1996]; Leung et al., Technique, 1:11-15 [1989];Eckert and Kunkel, PCR Methods Appl., 1:17-24 [1991]; Caldwell andJoyce, PCR Methods Appl., 2:28-33 (1992); and Zhao and Arnold, Nuc.Acids. Res., 25:1307-08 [1997]). After mutagenesis, the resulting clonesare selected for desirable activity. Successive rounds of mutagenesisand selection are often necessary to develop enzymes with desirableproperties. It should be noted that only the useful mutations arecarried over to the next round of mutagenesis.

In other embodiments of the present invention, the polynucleotides ofthe present invention are used in gene shuffling or sexual PCRprocedures (for example, Smith, Nature, 370:324-25 [1994]; U.S. Pat.Nos. 5,837,458; 5,830,721; 5,811,238; and 5,733,731). Gene shufflinginvolves random fragmentation of several mutant DNAs followed by theirreassembly by PCR into full length molecules. Examples of various geneshuffling procedures include, but are not limited to, assembly followingDNase treatment, the staggered extension process (STEP), and randompriming in vitro recombination. In the DNase mediated method, DNAsegments isolated from a pool of positive mutants are cleaved intorandom fragments with DNaseI and subjected to multiple rounds of PCRwith no added primer. The lengths of random fragments approach that ofthe uncleaved segment as the PCR cycles proceed, resulting in mutationsin present in different clones becoming mixed and accumulating in someof the resulting sequences. Multiple cycles of selection and shufflinghave led to the functional enhancement of several enzymes (Stemmer,Nature, 370:398-91 [1994]; Stemmer, Proc. Natl. Acad. Sci. USA, 91,10747-51 [1994]; Crameri et al., Nat. Biotech., 14:315-19 [1996]; Zhanget al., Proc. Natl. Acad. Sci. USA, 94:4504-09 [1997]; and Crameri etal., Nat. Biotech., 15:436-38 [1997]).

IV. Vectors, Engineering, and Expression of Sequences

In another embodiment of the invention, polynucleotide sequences orfragments thereof which encode polypeptides, or fusion proteins orfunctional equivalents thereof, may be used in recombinant DNA moleculesto direct expression of a polypeptide in appropriate host cells. Due tothe inherent degeneracy of the genetic code, other DNA sequences whichencode substantially the same or a functionally equivalent amino acidsequence may be produced and these sequences may be used to clone andexpress polypeptides (for example, a polypeptide encoded by a nucleicacid of the present invention).

As will be understood by those of skill in the art, it may beadvantageous to produce nucleotide sequences possessing non-naturallyoccurring codons. For example, codons preferred by a particularprokaryotic or eukaryotic host can be selected to increase the rate ofprotein expression or to produce a recombinant RNA transcript havingdesirable properties, such as a half-life which is longer than that of atranscript generated from the naturally occurring sequence.

The nucleotide sequences of the present invention can be engineeredusing methods generally known in the art in order to alter thepolypeptide sequences for a variety of reasons, including but notlimited to, alterations which modify the cloning, processing, and/orexpression of the gene product. DNA shuffling by random fragmentationand PCR reassembly of gene fragments and synthetic oligonucleotides maybe used to engineer the nucleotide sequences. For example, site-directedmutagenesis may be used to insert new restriction sites, alterglycosylation patterns, change codon preference, produce splicevariants, or introduce mutations, and so forth.

In another embodiment of the invention, natural, modified, orrecombinant nucleic acid sequences encoding a polypeptide may be ligatedto a heterologous sequence to encode a fusion protein. For example, toscreen peptide libraries for inhibitors of the polypeptides activity(for example, enzymatic activity), it may be useful to encode a chimericprotein that can be recognized by a commercially available antibody. Afusion protein may also be engineered to contain a cleavage site locatedbetween the polypeptide encoding sequence and the heterologous proteinsequence, so that the polypeptide of interest may be cleaved andpurified away from the heterologous moiety.

In another embodiment, sequences encoding a polypeptide (for example, apolypeptide encoded by a nucleic acid of the present invention) may besynthesized, in whole or in part, using chemical methods well known inthe art (See for example, Caruthers et al., Nucl. Acids Res. Symp. Ser.215 [1980]; Hom et al., Nucl. Acids Res. Symp. Ser. 225 [1980]).Alternatively, the protein itself may be produced using chemical methodsto synthesize the amino acid sequence of the polypeptide of interest(for example, a polypeptide encoded by a nucleic acid of the presentinvention), or a portion thereof. For example, peptide synthesis can beperformed using various solid-phase techniques (Roberge et al., Science269:202 [1995]) and automated synthesis may be achieved, for example,using the ABI 431A peptide synthesizer (PE Corporation, Norwalk, Conn.).

The newly synthesized peptide may be substantially purified bypreparative high performance liquid chromatography (See for example,Creighton, T. (1983) Proteins, Structures and Molecular Principles, WHFreeman and Co., New York, N.Y.). The composition of the syntheticpeptides may be confirmed by amino acid analysis or sequencing (forexample, the Edman degradation procedure; or Creighton, supra).Additionally, the amino acid sequence of the polypeptide of interest orany part thereof, may be altered during direct synthesis and/or combinedusing chemical methods with sequences from other proteins, or any partthereof, to produce a variant polypeptide.

In order to express a biologically active polypeptide (for example, apolypeptide encoded by a nucleic acid of the present invention), thenucleotide sequences encoding the polypeptide or functional equivalents,may be inserted into appropriate expression vector, that is, a vectorwhich contains the necessary elements for the transcription andtranslation of the inserted coding sequence.

Methods which are well known to those skilled in the art may be used toconstruct expression vectors containing sequences encoding polypeptides(for example, a polypeptide encoded by a nucleic acid of the presentinvention) and appropriate transcriptional and translational controlelements. These methods include in vitro recombinant DNA techniques,synthetic techniques, and in vivo genetic recombination. Such techniquesare described in Sambrook. et al. (1989) Molecular Cloning, A LaboratoryManual, Cold Spring Harbor Press, Plainview, N.Y., and Ausubel, F. M. etal. (1989) Current Protocols in Molecular Biology, John Wiley & Sons,New York, N.Y.

A variety of expression vector/host systems may be utilized to containand express sequences encoding a polypeptide of interest. These include,but are not limited to, microorganisms such as bacteria transformed withrecombinant bacteriophage, plasmid, or cosmid DNA expression vectors;yeast transformed with yeast expression vectors; insect cell systemsinfected with virus expression vectors (for example, baculovirus); plantcell systems transformed with virus expression vectors (for example,cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV; brome mosaicvirus) or with bacterial expression vectors (for example, Ti or pBR322plasmids); or animal cell systems.

The “control elements” or “regulatory sequences” are thosenon-translated regions of the vector—enhancers, promoters, 5′ and 3′untranslated regions—which interact with host cellular proteins to carryout transcription and translation. Such elements may vary in theirstrength and specificity. Depending on the vector system and hostutilized, any number of suitable transcription and translation elements,including constitutive and inducible promoters, may be used. Forexample, when cloning in bacterial systems, inducible promoters such asthe hybrid lacZ promoter of the BLUESCRIPT phagemid (Stratagene,LaJolla, Calif.) or PSPORT1 plasmid (Life Technologies, Inc., Rockville,Md.) and the like may be used. The baculovirus polyhedrin promoter maybe used in insect cells. Promoters or enhancers derived from the genomesof plant cells (for example, heat shock, RUBISCO; and storage proteingenes) or from plant viruses (for example, viral promoters or leadersequences) may be cloned into the vector. In mammalian cell systems,promoters from mammalian genes or from mammalian viruses are preferable.If it is necessary to generate a cell line that contains multiple copiesof the sequence encoding a polypeptide, vectors based on SV40 or EBV maybe used with an appropriate selectable marker.

In bacterial systems, a number of expression vectors may be selecteddepending upon the use intended for the polypeptide of interest. Forexample, when large quantities of the polypeptide are needed for theinduction of antibodies, vectors which direct high level expression offusion proteins that are readily purified may be used. Such vectorsinclude, but are not limited to, the multifunctional E. coli cloning andexpression vectors such as BLUESCRIPT phagemid (Stratagene, La Jolla,Calif.), in which the sequence encoding the polypeptide of interest maybe ligated into the vector in frame with sequences for theamino-terminal Met and the subsequent 7 residues of beta-galactosidaseso that a hybrid protein is produced; pIN vectors (Van Heeke andSchuster, J. Biol. Chem. 264:5503 [1989]; and the like. pGEMX vectors(Promega Corporation, Madison, Wis.) may also be used to express foreignpolypeptides as fusion proteins with glutathione S-transferase (GST). Ingeneral, such fusion proteins are soluble and can easily be purifiedfrom lysed cells by adsorption to glutathione-agarose beads followed byelution in the presence of free glutathione. Proteins made in suchsystems may be designed to include heparin, thrombin, or factor XAprotease cleavage sites so that the cloned polypeptide of interest canbe released from the GST moiety at will.

In the yeast Saccharomyces cerevisiae, a number of vectors containingconstitutive or inducible promoters such as alpha factor, alcoholoxidase, and PGH may be used. For reviews, See for example, Ausubel etal. (supra) and Grant et al., Methods Enzymol. 153:516 [1987].

In cases where plant expression vectors are used, the expression ofsequences encoding polypeptides may be driven by any of a number ofpromoters. In a preferred embodiment, plant vectors are created using arecombinant plant virus containing a recombinant plant viral nucleicacid, as described in PCT publication WO 96/40867. Subsequently, therecombinant plant viral nucleic acid which contains one or morenon-native nucleic acid sequences may be transcribed or expressed in theinfected tissues of the plant host and the product of the codingsequences may be recovered from the plant, as described in WO 99/36516.

An important feature of this embodiment is the use of recombinant plantviral nucleic acids which contain one or more non-native subgenomicpromoters capable of transcribing or expressing adjacent nucleic acidsequences in the plant host and which result in replication and localand/or systemic spread in a compatible plant host. The recombinant plantviral nucleic acids have substantial sequence homology to plant viralnucleotide sequences and may be derived from an RNA, DNA, cDNA or achemically synthesized RNA or DNA. A partial listing of suitable virusesis described below.

The first step in producing recombinant plant viral nucleic acidsaccording to this particular embodiment is to modify the nucleotidesequences of the plant viral nucleotide sequence by known conventionaltechniques such that one or more non-native subgenomic promoters areinserted into the plant viral nucleic acid without destroying thebiological function of the plant viral nucleic acid. The native coatprotein coding sequence may be deleted in some embodiments, placed underthe control of a non-native subgenomic promoter in other embodiments, orretained in a further embodiment. If it is deleted or otherwiseinactivated, a non-native coat protein gene is inserted under control ofone of the non-native subgenomic promoters, or optionally under controlof the native coat protein gene subgenomic promoter. The non-native coatprotein is capable of encapsidating the recombinant plant viral nucleicacid to produce a recombinant plant virus. Thus, the recombinant plantviral nucleic acid contains a coat protein coding sequence, which may benative or a normative coat protein coding sequence, under control of oneof the native or non-native subgenomic promoters. The coat protein isinvolved in the systemic infection of the plant host.

Some of the viruses which meet this requirement include viruses from thetobamovirus group such as Tobacco Mosaic virus (TMV), Ribgrass MosaicVirus (RGM), Cowpea Mosaic virus (CMV), Alfalfa Mosaic virus (AMV),Cucumber Green Mottle Mosaic virus watermelon strain (CGMMV-W) and OatMosaic virus (OMV) and viruses from the brome mosaic virus group such asBrome Mosaic virus (BMV), broad bean mottle virus and cowpea chloroticmottle virus. Additional suitable viruses include Rice Necrosis virus(RNV), and geminiviruses such as tomato golden mosaic virus (TGMV),Cassaya latent virus (CLV) and maize streak virus (MSV). However, theinvention should not be construed as limited to using these particularviruses, but rather the method of the present invention is contemplatedto include all plant viruses at a minimum.

Other embodiments of plant vectors used for the expression of sequencesencoding polypeptides include, for example, viral promoters such as the35S and 19S promoters of CaMV used alone or in combination with theomega leader sequence from TMV (Takamatsu, EMBO J. 6:307 [1987]).Alternatively, plant promoters such as the small subunit of RUBISCO orheat shock promoters may be used (Coruzzi et al., EMBO J. 3:1671 [1984];Broglie et al., Science 224:838 [1984]; and Winter et al., ResultsProbl. Cell Differ. 17:85 [1991]). These constructs can be introducedinto plant cells by direct DNA transformation or pathogen-mediatedtransfection. Such techniques are described in a number of generallyavailable reviews (See for example, Hobbs, S. or Murry, L. E. in McGrawHill Yearbook of Science and Technology (1992) McGraw Hill, New York,N.Y.; pp. 191-196.

An insect system may also be used to express polypeptides (for example,a polypeptide encoded by a nucleic acid of the present invention). Forexample, in one such system, Autographa californica nuclear polyhedrosisvirus (AcNPV) is used as a vector to express foreign genes in Spodopterafrugiperda cells or in Trichoplusia larvae. The sequences encoding apolypeptide of interest may be cloned into a non-essential region of thevirus, such as the polyhedrin gene, and placed under control of thepolyhedrin promoter. Successful insertion of the nucleic acid sequenceencoding the polypeptide of interest will render the polyhedrin geneinactive and produce recombinant virus lacking coat protein. Therecombinant viruses may then be used to infect, for example, S.frugiperda cells or Trichoplusia larvae in which the polypeptide may beexpressed (Engelhard et al., Proc. Nat. Acad. Sci. 91:3224 [1994]).

In mammalian host cells, a number of viral-based expression systems maybe utilized. In cases where an adenovirus is used as an expressionvector, sequences encoding polypeptides may be ligated into anadenovirus transcription/translation complex consisting of the latepromoter and tripartite leader sequence. Insertion in a non-essential E1or E3 region of the viral genome may be used to obtain a viable viruswhich is capable of expressing the polypeptide in infected host cells(Logan and Shenk, Proc. Natl. Acad. Sci., 81:3655 [1984]). In addition,transcription enhancers, such as the Rous sarcoma virus (RSV) enhancer,may be used to increase expression in mammalian host cells.

Specific initiation signals may also be used to achieve more efficienttranslation of sequences encoding the polypeptide of interest. Suchsignals include the ATG initiation codon and adjacent sequences. Incases where sequences encoding the polypeptide of interest, itsinitiation codon, and upstream sequences are inserted into theappropriate expression vector, no additional transcriptional ortranslational control signals may be needed. However, in cases whereonly coding sequence, or a portion thereof, is inserted, exogenoustranslational control signals including the ATG initiation codon shouldbe provided. Furthermore, the initiation codon should be in the correctreading frame to ensure translation of the entire insert. Exogenoustranslational elements and initiation codons may be of various origins,both natural and synthetic. The efficiency of expression may be enhancedby the inclusion of enhancers which are appropriate for the particularcell system which is used, such as those described in the literature(Scharf et al., Results Probl. Cell Differ., 20:125 [1994]).

In addition, a host cell strain may be chosen for its ability tomodulate the expression of the inserted sequences or to process theexpressed protein in the desired fashion. Such modifications of thepolypeptide include, but are not limited to, acetylation, carboxylation,glycosylation, phosphorylation, lipidation, and acylation.Post-translational processing which cleaves a “prepro” form of theprotein may also be used to facilitate correct insertion, folding and/orfunction. Different host cells such as CHO, HeLa, MDCK, HEK293, andWI38, which have specific cellular machinery and characteristicmechanisms for such post-translational activities, may be chosen toensure the correct modification and processing of the foreign protein.

For long-term, high-yield production of recombinant proteins, stableexpression is preferred. For example, cell lines which stably expressthe polypeptide of interest (for example, a polypeptide encoded by anucleic acid of the present invention) may be transformed usingexpression vectors which may contain viral origins of replication and/orendogenous expression elements and a selectable marker gene on the sameor on a separate vector. Following the introduction of the vector, cellsmay be allowed to grow for 1-2 days in an enriched media before they areswitched to selective media. The purpose of the selectable marker is toconfer resistance to selection, and its presence allows growth andrecovery of cells which successfully express the introduced sequences.Resistant clones of stably transformed cells may be proliferated usingtissue culture techniques appropriate to the cell type.

Any number of selection systems may be used to recover transformed celllines. These include, but are not limited to, the herpes simplex virusthymidine kinase (Wigler et al., Cell 11:223 [1977]) and adeninephosphoribosyltransferase (Lowy et al., Cell 22:817 [1980]) genes whichcan be employed in tk⁻ or aprt⁻ cells, respectively. Also,antimetabolite, antibiotic, or herbicide resistance can be used as thebasis for selection; for example, dhfr, which confers resistance tomethotrexate (Wigler et al., Proc. Natl. Acad. Sci., 77:3567 [1980]);npt, which confers resistance to the aminoglycosides neomycin and G-418(Colbere-Garapin et al., J. Mol. Biol., 150:1 [1981]); and als or pat,which confer resistance to chlorsulfuron and phosphinotricinacetyltransferase, respectively (Murry, supra). Additional selectablegenes have been described, for example, trpB, which allows cells toutilize indole in place of tryptophan, or hisD, which allows cells toutilize histinol in place of histidine (Hartman. and Mulligan, Proc.Natl. Acad. Sci., 85:8047 [1988]). Recently, the use of visible markershas gained popularity with such markers as anthocyanins, α-glucuronidaseand its substrate GUS, and luciferase and its substrate luciferin, beingwidely used not only to identify transformants, but also to quantify theamount of transient or stable protein expression attributable to aspecific vector system (Rhodes et al., Methods Mol. Biol., 55:121[1995]).

Although the presence/absence of marker gene expression suggests thatthe gene of interest is also present, its presence and expression mayneed to be confirmed. For example, if the sequence encoding apolypeptide is inserted within a marker gene sequence, recombinant cellscontaining sequences encoding the polypeptide can be identified by theabsence of marker gene function. Alternatively, a marker gene can beplaced in tandem with a sequence encoding the polypeptide under thecontrol of a single promoter. Expression of the marker gene in responseto induction or selection usually indicates expression of the tandemgene as well.

Alternatively, host cells which contain the nucleic acid sequenceencoding the polypeptide of interest (for example, a polypeptide encodedby a nucleic acid of the present invention) and express the polypeptidemay be identified by a variety of procedures known to those of skill inthe art. These procedures include, but are not limited to, DNA-DNA orDNA-RNA hybridizations and protein bioassay or immunoassay techniqueswhich include membrane, solution, or chip based technologies for thedetection and/or quantification of nucleic acid or protein.

The presence of polynucleotide sequences encoding a polypeptide ofinterest (for example, a polypeptide encoded by a nucleic acid of thepresent invention) can be detected by DNA-DNA or DNA-RNA hybridizationor amplification using probes or portions or fragments ofpolynucleotides encoding the polypeptide. Nucleic acid amplificationbased assays involve the use of oligonucleotides or oligomers based onthe sequences encoding the polypeptide to detect transformantscontaining DNA or RNA encoding the polypeptide. As used herein“oligonucleotides” or “oligomers” refer to a nucleic acid sequence of atleast about 10 nucleotides and as many as about 60 nucleotides,preferably about 15 to 30 nucleotides, and more preferably about 20-25nucleotides, which can be used as a probe or amplimer.

A variety of protocols for detecting and measuring the expression of apolypeptide (for example, a polypeptide encoded by a nucleic acid of thepresent invention), using either polyclonal or monoclonal antibodiesspecific for the protein are known in the art. Examples includeenzyme-linked immunosorbent assay (ELISA), radioimmunoassay (RIA), andfluorescence activated cell sorting (FACS). A two-site, monoclonal-basedimmunoassay utilizing monoclonal antibodies reactive to twonon-interfering epitopes on the polypeptide is preferred, but acompetitive binding assay may be employed. These and other assays aredescribed, among other places, in Hampton et al., 1990; SerologicalMethods, a Laboratory Manual, APS Press, St Paul, Minn. and Maddox etal., J. Exp. Med., 158:1211 [1983]).

A wide variety of labels and conjugation techniques are known by thoseskilled in the art and may be used in various nucleic acid and aminoacid assays. Means for producing labeled hybridization or PCR probes fordetecting sequences related to polynucleotides encoding a polypeptide ofinterest include oligonucleotide labeling, nick translation,end-labeling or PCR amplification using a labeled nucleotide.Alternatively, the sequences encoding the polypeptide, or any portionsthereof may be cloned into a vector for the production of an mRNA probe.Such vectors are known in the art, are commercially available, and maybe used to synthesize RNA probes in vitro by addition of an appropriateRNA polymerase such as T7, T3, or SP6 and labeled nucleotides. Theseprocedures may be conducted using a variety of commercially availablekits from Pharmacia & Upjohn (Kalamazoo, Mich.), Promega Corporation(Madison, Wis.) and U.S. Biochemical Corp. (Cleveland, Ohio). Suitablereporter molecules or labels, which may be used, include radionuclides,enzymes, fluorescent, chemiluminescent, or chromogenic agents as well assubstrates, cofactors, inhibitors, magnetic particles, and the like.

Host cells transformed with nucleotide sequences encoding a polypeptideof interest may be cultured under conditions suitable for the expressionand recovery of the protein from cell culture. The protein produced by arecombinant cell may be secreted or contained intracellularly dependingon the sequence and/or the vector used. As will be understood by thoseof skill in the art, expression vectors containing polynucleotides whichencode the polypeptide of interest (for example, a polypeptide encodedby a nucleic acid of the present invention) may be designed to containsignal sequences which direct secretion of the polypeptide through aprokaryotic or eukaryotic cell membrane. Other recombinant constructionsmay be used to join sequences encoding the polypeptide to nucleotidesequence encoding a polypeptide domain which will facilitatepurification of soluble proteins. Such purification facilitating domainsinclude, but are not limited to, metal chelating peptides such ashistidine-tryptophan modules that allow purification on immobilizedmetals, protein A domains that allow purification on immobilizedimmunoglobulin, and the domaim utilized in the FLAGS extension/affinitypurification system (Immunex Corp., Seattle, Wash.). The inclusion ofcleavable linker sequences such as those specific for Factor XA orenterokinase (available from Invitrogen, San Diego, Calif.) between thepurification domain and the polypeptide of interest may be used tofacilitate purification. One such expression vector provides forexpression of a fusion protein containing the polypeptide of interestand a nucleic acid encoding 6 histidine residues preceding a thioredoxinor an enterokinase cleavage site. The histidine residues facilitatepurification on IMIAC (immobilized metal ion affinity chromatography) asdescribed in Porath et al., Prot. Exp. Purif., 3:263 [1992] while theenterokinase cleavage site provides a means for purifying thepolypeptide from the fusion protein. A discussion of vectors whichcontain fusion proteins is provided in Kroll et al., DNA Cell Biol.,12:441 [1993]).

In addition to recombinant production, fragments of the polypeptide ofinterest may be produced by direct peptide synthesis using solid-phasetechniques (Merrifield, J. Am. Chem. Soc., 85:2149 [1963]). Proteinsynthesis may be performed using manual techniques or by automation.Automated synthesis may be achieved, for example, using the AppliedBiosystems 431A peptide synthesizer (Perkin Elmer). Various fragments ofthe polypeptide may be chemically synthesized separately and combinedusing chemical methods to produce the full length molecule.

V. Altering of Gene Expression

It is contemplated that the polynucleotides of the present invention(for example, SEQ ID NOs:1-571) may be utilized to either increase ordecrease the level of corresponding mRNA and/or protein in transfectedcells as compared to the levels in wild-type cells. Accordingly, in someembodiments, expression in plants by the methods described above leadsto the overexpression of the polypeptide of interest in transgenicplants, plant tissues, or plant cells. The present invention is notlimited to any particular mechanism. Indeed, an understanding of amechanism is not required to practice the present invention. However, itis contemplated that overexpression of the polynucleotides of thepresent invention will alter the expression of the gene comprising thenucleic acid sequence of the present invention.

In other embodiments of the present invention, the polynucleotides areutilized to decrease the level of the protein or mRNA of interest intransgenic plants, plant tissues, or plant cells as compared towild-type plants, plant tissues, or plant cells. One method of reducingprotein expression utilizes expression of antisense transcripts (forexample, U.S. Pat. Nos. 6,031,154; 5,453,566; 5,451,514; 5,859,342; and4,801,340). Antisense RNA has been used to inhibit plant target genes ina tissue-specific manner (for example, Van der Krol et al.,Biotechniques 6:958-976 [1988]). Antisense inhibition has been shownusing the entire cDNA sequence as well as a partial cDNA sequence (forexample, Sheehy et al., Proc. Natl. Acad. Sci. USA 85:8805-8809 [1988];Cannon et al., Plant Mol. Biol. 15:39-47 [1990]). There is also evidencethat 3′ non-coding sequence fragment and 5′ coding sequence fragments,containing as few as 41 base-pairs of a 1.87 kb cDNA, can play importantroles in antisense inhibition (Ch'ng et al., Proc. Natl. Acad. Sci. USA86:10006-10010 [1989]).

Accordingly, in some embodiments, the nucleic acids of the presentinvention (for example, SEQ ID NOs: 1-571, and fragments and variantsthereof) are oriented in a vector and expressed so as to produceantisense transcripts. To accomplish this, a nucleic acid segment fromthe desired gene is cloned and operably linked to a promoter such thatthe antisense strand of RNA will be transcribed. The expression cassetteis then transformed into plants and the antisense strand of RNA isproduced. The nucleic acid segment to be introduced generally will besubstantially identical to at least a portion of the endogenous gene orgenes to be repressed. The sequence, however, need not be perfectlyidentical to inhibit expression. The vectors of the present inventioncan be designed such that the inhibitory effect applies to otherproteins within a family of genes exhibiting homology or substantialhomology to the target gene.

Furthermore, for antisense suppression, the introduced sequence alsoneed not be full length relative to either the primary transcriptionproduct or fully processed mRNA. Generally, higher homology can be usedto compensate for the use of a shorter sequence. Furthermore, theintroduced sequence need not have the same intron or exon pattern, andhomology of non-coding segments may be equally effective. Normally, asequence of between about 30 or 40 nucleotides and up to about the fulllength full length of the coding region should be used, although asequence of at least about 100 nucleotides is preferred, a sequence ofat least about 200 nucleotides is more preferred, and a sequence of atleast about 500 nucleotides is especially preferred.

Catalytic RNA molecules or ribozymes can also be used to inhibitexpression of the target gene or genes. It is possible to designribozymes that specifically pair with virtually any target RNA andcleave the phosphodiester backbone at a specific location, therebyfunctionally inactivating the target RNA. In carrying out this cleavage,the ribozyme is not itself altered, and is thus capable of recycling andcleaving other molecules, making it a true enzyme. The inclusion ofribozyme sequences within antisense RNAs confers RNA-cleaving activityupon them, thereby increasing the activity of the constructs.

A number of classes of ribozymes have been identified. One class ofribozymes is derived from a number of small circular RNAs which arecapable of self-cleavage and replication in plants. The RNAs replicateeither alone (viroid RNAs) or with a helper virus (satellite RNAs).Examples include RNAs from avocado sunblotch viroid and the satelliteRNAs from tobacco ringspot virus, lucerne transient streak virus, velvettobacco mottle virus, Solanum nodiflorum mottle virus and subterraneanclover mottle virus. The design and use of target RNA-specific ribozymesis described in Haseloff, et al., Nature 334:585-591 (1988).

Another method of reducing protein expression utilizes the phenomenon ofcosuppression or gene silencing (for example, U.S. Pat. Nos. 6,063,947;5,686,649; and 5,283,184). The phenomenon of cosuppression has also beenused to inhibit plant target genes in a tissue-specific manner.Cosuppression of an endogenous gene using a full-length cDNA sequence aswell as a partial cDNA sequence (730 bp of a 1770 bp cDNA) are known(for example, Napoli et al., Plant Cell 2:279-289 [1990]; van der Krolet al., Plant Cell 2:291-299 [1990]; Smith et al., Mol. Gen. Genetics224:477-481 [1990]). Accordingly, in some embodiments the nucleic acids(for example, SEQ ID NOs: 1-571, and fragments and variants thereof)from one species of plant are expressed in another species of plant toeffect cosuppression of a homologous gene. Generally, where inhibitionof expression is desired, some transcription of the introduced sequenceoccurs. The effect may occur where the introduced sequence contains nocoding sequence per se, but only mtron or untranslated sequenceshomologous to sequences present in the primary transcript of theendogenous sequence. The introduced sequence generally will besubstantially identical to the endogenous sequence intended to berepressed. This minimal identity will typically be greater than about65%, but a higher identity might exert a more effective repression ofexpression of the endogenous sequences. Substantially greater identityof more than about 80% is preferred, though about 95% to absoluteidentity would be most preferred. As with antisense regulation, theeffect should apply to any other proteins within a similar family ofgenes exhibiting homology or substantial homology.

For cosuppression, the introduced sequence in the expression cassette,needing less than absolute identity, also need not be full length,relative to either the primary transcription product or fully processedmRNA. This may be preferred to avoid concurrent production of someplants which are overexpressers. A higher identity in a shorter thanfull length sequence compensates for a longer, less identical sequence.Furthermore, the introduced sequence need not have the same intron orexon pattern, and identity of non-coding segments will be equallyeffective. Normally, a sequence of the size ranges noted above forantisense regulation is used.

VI. Expression of Sequences Producing Stunting Phenotype

The present invention provides nucleic sequences involved in stunting ofgrowth in plants. Plants transformed with viral vectors comprising thenucleic acid sequences of the present invention were screened for astunting phenotype (see Examples 10 and 18). Accordingly, in someembodiments, the present invention provides nucleic acid sequences thatproduce a stunting phenotype when expressed in plant. The presentinvention is not limited to the particular nucleic acid sequenceslisted. Indeed, it contemplated that nucleic acid sequences whichhybridize to the listed nucleic sequences under conditions ranging fromlow to high stringency and which also cause the stunting phenotype.These sequences are conveniently identified by insertion into GENEWAREvectors and expression in plants as detailed in the examples.Accordingly, in particularly preferred embodiments, the sequences thatproduce a stunting phenotype, include, but are not limited to, SEQ IDNOs: 47, 58, 336, 288, 291, 297, 302, 304, 313, 321, 322, 323, 324, 325,326, 327, 328, 329, 330, 331, 332 and sequences that hybridize to thesesequences under conditions of low to high stringency. In someembodiments, the sequences are operably linked to a plant promoter orprovided in a vector as described in more detail above. Furthermore, thesequences can be expressed in either sense or antisense orientation. Inparticularly preferred embodiments, the sequences are at least 30nucleotides in length up to the length of the full-length of thecorresponding gene. It is contemplated that sequences of less than fulllength (for example, greater than about 30 nucleotides) are useful fordown regulation of gene expression via antisense or cosuppression.Suitable sequences are selected by chemically synthesizing thesequences, cloning into GENEWARE expression vectors, expressing inplants, and selecting plants with a stunting phenotype.

VII. Characterization of Metabolic Hits

In some embodiments, the present invention provides novel methods forthe characterization of the chemical nature of genetic modificationsmade in tobacco plants using GENEWARE viral vector technology or otherexpression technologies. The methods comprise separating fractions fromleaf extracts of plants transfected with nucleic acid sequences of thepresent invention using chromatographic and mass spectroscopytechniques, followed by searching a series of databases.

In some embodiments, the characterization is performed on samplesidentified as metabolic hits (as described above and in Example 19). Insome embodiments, samples are labeled with a bar code to facilitatetracking and database searching. Samples are first separated usingchromatography (for example, gas chromatography (GC)), followed by massspectroscopy (MS). The present invention is not limited to a particularGC/MS system. Any suitable analysis system may be utilized, includingbut not limited to, those commercially available from AgilentTechnologies, Hewllet Packard, Leap Technologies, and APEX.

In preferred embodiments, internal standards are added to the samplesprior to analysis. The internal standards utilized are specific to theleaf Fraction analyzed. For example, in some embodiments, fraction 1(See Example 19 for a description of component of fractions) is analyzedusing the internal standards Pentacosane and Hexatriacontane, Fraction 2is analyzed using Undecanoic acid, methyl ester and Tetracosanoic acid,methyl ester as internal standards, and Fraction 3 is analyzed usingn-Octyl-β-D-Glucopyranoside. In some embodiments, certain fractions (forexample, those containing lipids or highly polar water-solublemolecules) are derivatized prior to analysis to make certain componentsmore amenable to gas chromatography. The present invention is notlimited to the analysis of the fractions described herein. Any separatedsolution containing biological macromolecules (for example, proteins,lipid, and carbohydrates) may be analyzed using the methods of thepresent invention. GC/MS may be performed using any suitable protocol,including but not limited to, those described in Example 20 below. Inpreferred embodiments, instrument performance standards are analyzedalone with fractionated sample (see Example 20 for examples of suitablestandards).

In some embodiments, sample and Reference data sets are next processedusing the Bioinformatics computer program Maxwell (The Dow ChemicalCompany, Midland, Mich.). The principal elements of the program are 1)Data Reduction, 2) two-dimensional Peak Matching, 3) Quantitative PeakDifferentiation (Determination of Relative Quantitative Change), 4) PeakIdentification, 5) Data Sorting, and 6) Customized Reporting.

The program first queries the user for the filenames of the Referencedata set and Sample data set(s) to compare against the Reference. Theprogram then integrates the Total Ion Chromatogram (TIC) of the datasets using (for example, using Agilent Technologies HP ChemStationintegrator parameters). In preferred embodiments, parameters forintegration are determined by the analyst. The corresponding raw peakareas are then normalized to the respective Internal Standard peak area.Peak tables from the Reference and each Sample are then generated. Thepeak tables are comprised of retention time (RT), retention index(RI)—the retention time relative to the Internal Standard RT, raw peakareas, peak areas normalized to the Internal Standard, and otherpertinent information.

In preferred embodiments, following peak identification, one or more(preferably two) filtering steps are employed. In some embodiments,filtering criteria are established by the analyst and must be met beforea peak is further analyzed. In some embodiments, the first filteringcriterion is based upon a peak-'s normalized area. All normalized peakshaving values below the Limit of Processing for Peak Matching (LOP-PM),are considered to be “background.” In preferred embodiments, backgroundpeaks are not carried forward for any type of mathematical calculationor spectral comparison.

In some embodiments, an initial peak-matching step, comprising comparingthe Sample peak table to the Reference peak table pairing peaks basedupon their respective RI values matching one another (within a givenvariable window) is conducted. In some embodiments, the next step in thepeak matching routine comprises a spectral comparison of Sample andReference peaks that have been chromatographically. The spectralmatching is performed using a mass spectral cross-correlation algorithmwithin the Agilent Technologies HP ChemStation software. Thecross-correlation algorithm generates an equivalence value based uponspectral “fit” that is used to determine whether the chromatographicallymatched peaks are spectrally similar or not. This equivalence value isreferred to as the MS-XCR value and must meet or exceed a predeterminedvalue for a pair of peaks to be “MATCHED,” which means they appear to bethe same compound in both the Reference and the Sample. The MS-XCR valuecan also be used to judge peak purity.

In preferred embodiments, the two-dimensional peak matching process isrepeated until all potential peak matches were processed. At the end ofthe process, peaks are categorized into two categories, MATCHED andUNMATCHED.

In some embodiments, a second filtering criterion is next invoked. Thesecond filtering step is also based upon the normalized area of theMATCHED or UNMATCHED peak. For a peak to be reported and furtherprocessed, its normalized area must meet or exceed the predeterminedLimit of Processing for Sorting (LOP-SRT).

Peaks that are UNMATCHED are immediately flagged as different. UNMATCHEDpeaks are of two types. There are those that are reported in theReference but appear to be absent in the Sample (based upon criteria forquantitation and reporting). These peaks are designated in with apercent change of “—100 percent” and the description “UNMATCHED INSAMPLE.” The second types of peaks are those that are not reported inthe Reference (again, based upon criteria for quantitation andreporting) but were reported in the Sample, thus appearing to be “new”peaks. These peaks are designated with a percent change of “100 percent”and the description “NEW PEAK UNMATCHED IN NULL.”

In preferred embodiments, MATCHED peaks are processed further forrelative quantitative differentiation. This quantitative differentiationis expressed as a percent change of the Sample peak area relative to thearea of the Reference peak. A predetermined threshold for change must beobserved for the change to be determined biochemical and statisticallysignificant. The change threshold is based upon previously observedbiological and analytical variability factors. Only changes above thethreshold for change are reported.

In some embodiments, following filtration, peaks are next processedthrough a peak identification process. In some embodiments, the massspectra of the peaks is first searched against a mass spectral plantmetabolite libraries (for example, including but not limited to, thedatabase developed by Function Discovery Laboratories, The Dow ChemicalCompany, Midland, Mich.). The equivalence value assigned to the librarymatch is used as an indication of a proper identification. In someembodiments, in order to provide additional confirmation to the identityof a peak, or to suggest other possibilities, library hits are searchedfurther against a Biotechnology database (for example, including but notlimited to, the database developed by Function Discovery Laboratories,The Dow Chemical Company, Midland, Mich.). In preferred embodiments, theBiotechnology database incorporates chemical structures.

In some embodiments, the Chemical Abstract Services (CAS) number ofcompounds identified from the library is searched against thosecontained in the database. If a match is found, the CAS number in thedatabase is then correlated to the data acquisition method for thatrecord. If the method matches, the program then compares the retentionindex (RI) of the component against the value contained in the databasefor that given method. Should the RI's match (within a given window ofvariability), then the peak identity is given a high degree ofcertainty. In some embodiments, components in the Sample that are notidentified by this process are assigned a unique identifier fortracking.

In preferred embodiments, the program then sorts the data to generate apreliminary report referred to as an analyst report. The analyst reportincludes, but it not limited to, PBM algorithm match quality value(equivalence value), RT, Normalized Peak Area, RI (Sample), RI(database) Peak Identification status [peak identity of high certainty(peaks identified by the program based on the pre-established criteria)or criteria not met (program did not positively identify thecomponent)], Component Name, CAS Number, Mass Spectral Library(containing spectrum most closely matched to that of the component),Unknown ID (unique identifier used to track unidentified components),MS-XCR value, Relative % Change, Notes (MATCHED/UNMATCHED), and othermiscellaneous information. In some embodiments, the analyst report isreviewed by an analyst, who edits the report to generate a modifiedreport for further processing by the program.

For compounds that were derivitized prior to analysis, the compoundnames in the modified analyst report (MAR) are those of the derivatives.In some embodiments, to accurately reflect the true components of thesefractions, the MAR is further processed using information contained inan additional database that cross-references the observed derivatizedcompound to that of the original, underivatized “parent” compound by wayof their respective CAS numbers and replaces derivatives with parentnames and information for the final report. In addition, anyunidentified components are assigned a “999999-99-9” CAS number for thefinal report. In some embodiments, the Modified Analyst Report alsocontains a HIT Score of 0, 1, or 2. The value is assigned by the analystto the data set of the Sample aliquot based on criteria, including butnot limited to, 0=No FDL data on Sample; 1=FDL data collected; Samplenot FDL HIT; and 2=FDL data collected; Sample is FDL HIT. An FDL HIT isdefined as a reportable percent change (modification) observed in aSample relative to Reference in a component of biochemical significance.

In some embodiments, an electronic copy of the final report is enteredinto the Nautilus LIMS system (BLIMS) and subsequently into eBRAD(Biotech database). In some embodiments, the program generates ahardcopy of the pinpointed TIC and the respective mass spectrum of eachcomponent that was reported to have changed.

VIII. Sequences Identified from Metabolic Screens

In some embodiments, the present invention provides nucleic sequencesthat alter the metabolism of plants when expressed in plants. Plantstransformed with viral vectors comprising the nucleic acid sequences ofthe present invention were screened for an altered metabolism phenotype(see Examples 19 and 20). A number of such sequences were identified.Accordingly, in some embodiments, the present invention provides nucleicacid sequences that produce an altered metabolism phenotype whenexpressed in plant. The present invention is not limited to theparticular nucleic acid sequences listed. Indeed, it contemplated thatnucleic acid sequences which hybridize to the listed nucleic sequencesunder conditions ranging from low to high stringency and which alsocause the stunting phenotype. These sequences are convenientlyidentified by insertion into GENEWARE vectors and expression in plantsas detailed in the examples. Accordingly, in particularly preferredembodiments, the sequences that produce an altered metabolism phenotype,include, but are not limited to, SEQ ID NOs: 281-304, 306-312, 314-324,326-330 and 333339-343 and sequences that hybridize to these sequencesunder conditions of low to high stringency. In some embodiments, thesequences are operably linked to a plant promoter or provided in avector as described in more detail above. These present invention alsocontemplates plants transformed or transfected with these sequences aswell as seeds from such transfected plants. Furthermore, the sequencescan expressed in either sense or antisense orientation. In particularlypreferred embodiments, the sequences are at least 30 nucleotides inlength up to the length of the full-length of the corresponding gene. Itis contemplated that sequences of less than full length (for example,greater than about 30 nucleotides) are useful for down regulation ofgene expression via antisense or cosupression. Suitable sequences areselected by chemically synthesizing the sequences, cloning into GENEWAREexpression vectors, expressing in plants, and selecting plants with analtered metabolism phenotype.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of acids in plants. Examples of acids that can be alteredaccording to the present invention include, but are not limited to,citric acid, carbamic acid, glyceric acid, phosphoric acid,11-eicosenoic acid (11Z), caffeic acid, chlorogenic acid, malic acid,phosphoric acid, inositol, terephthalic acid. The alterations inmetabolic profile are preferably accomplished by expressing one or moreof the following nucleic acid sequences (or sequences that hybridizethereto) in a plant: SEQ ID NO:335 (170074), SEQ ID NO:336 (175736), SEQID NO:282 (23242), SEQ ID NO:283 (23869), SEQ ID NO:289 (25026), SEQ IDNO:292 (25118), SEQ ID NO:293 (25124), SEQ ID NO:296 (25164), SEQ IDNO:297 (25170), SEQ ID NO:298 (25176), SEQ ID NO:299 (25196), SEQ IDNO:306 (27430), SEQ ID NO:311 (27819), SEQ ID NO:315 (30913), SEQ IDNO:318 (37186), SEQ ID NO:321 (45801), SEQ ID NO:323 (45808), SEQ IDNO:324 (45820), SEQ ID NO:328 (45855), and SEQ ID NO:329 (45864). Inpreferred embodiments, expression in plants of the sequences thathybridize to the preceding sequences also results in an increase,decrease, or alteration of acid production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of fatty acids in plants. Examples of fatty acids that can bealtered according to the present invention include, but are not limitedto, 9-octadecadienoic acid (9Z), eicosanoic acid, hexadecanoic acid,octadecanoic acid, 9,12,15-octadecatrienoic acid, 9,12-octadecadienoicacid, 7,10,13-docosatrionic acid (7Z,10Z,13Z), 7,10,13-hexadecatrienoicacid, docosanoic acid, heptadecanoic acid, 9-hexadecenoic acid,tetradecanoic acid, and 9-octadecenoic acid. The alterations inmetabolic profile are preferably accomplished by expressing one or moreof the following nucleic acid sequences (or sequences that hybridizethereto) in a plant: 175736 (SEQ ID NO:336), 21604 (SEQ ID NO:281),23242 (SEQ ID NO:282), 23869 (SEQ ID NO:283), 25009 (SEQ ID NO:286),25011 (SEQ ID NO:287), 25015 (SEQ ID NO:288), 25062 (SEQ ID NO:290),25104 (SEQ ID NO:291), 25133 (SEQ ID NO:294), 25144 (SEQ ID NO:295),25170 (SEQ ID NO:297), 25176 (SEQ ID NO:298), 25196 (SEQ ID NO:299),25421 (SEQ ID NO:300), 25431 (SEQ ID NO:302), 27440 (SEQ ID NO:307),27460 (SEQ ID NO:309), 27468 (SEQ ID NO:310), 27819 (SEQ ID NO:311),30307 (SEQ ID NO:314), 30913 (SEQ ID NO:315), 34136 (316), 37186 (SEQ IDNO:318), 37188 (SEQ ID NO:319), 45801 (321), 45804 (SEQ ID NO:322),45864 (SEQ ID NO:329) and 56465 (SEQ ID NO:333). In preferredembodiments, expression in plants of the sequences that hybridize to thepreceding sequences also results in an increase, decrease, or alterationof fatty acid production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of branched fatty acids in plants. Examples of branched fattyacids that can be altered according to the present invention include,but are not limited to, 16-methyl-heptadecanoic acid,16-methyl-heptadecanoic acid, and 14-methyl-hexadecanoic acid. Thealterations in metabolic profile are preferably accomplished byexpressing one or more of the following nucleic acid sequences (orsequences that hybridize thereto) in a plant: 175736 (SEQ ID NO:336),23242 (SEQ ID NO:282), 23869 (SEQ ID NO:283), 25009 (SEQ ID NO:286),25015 (SEQ ID NO:288), 25062 (SEQ ID NO:290), 25104 (SEQ ID NO:291),25133 (SEQ ID NO:294), 25144 (SEQ ID NO:295), 25170 (SEQ ID NO:297),25196 (SEQ ID NO:299), 25431 (SEQ ID NO:302), 27440 (SEQ ID NO:307),27460 (SEQ ID NO:309), 27468 SEQ ID NO:310), 30307 (SEQ iD NO:314),30913 (SEQ ID NO:315), 37188 (SEQ ID NO:319), and 45801 (SEQ IIDNO:321). In preferred embodiments, expression in plants of the sequencesthat hybridize to the preceding sequences also results in an increase,decrease, or alteration of branched fatty acid production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of hydroxy fatty acids in plants. Examples of hydroxy fattyacids that can be altered according to the present invention include,but are not limited to, malic acid, 2,3,4-trihydroxy-butanoic acid,3,4-dihydroxy-butanoic acid, 2,3-dihydroxypropyl-9,12-octadecadienoicacid ester, and 2,3-bis(acetyyloxy)propyl-eicosanoic acid ester. Thealterations in metabolic profile are preferably accomplished byexpressing one or more of the following nucleic acid sequences (orsequences that hybridize thereto) in a plant: 105039 (SEQ ID NO:334),23242 (SEQ ID NO:282), 23869 (SEQ ID NO:283), 25026 (SEQ ID NO:289),27430 (SEQ ID NO:306), 27819 (SEQ ID NO:311), 30913 (SEQ ID NO:315),37188 (SEQ ID NO:319), and 45808 (SEQ ID NO:323). In preferredembodiments, expression in plants of the sequences that hybridize to thepreceding sequences also results in an increase, decrease, or alterationof hydroxy fatty acid production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of alcohols in plants. Examples of alcohols that can bealtered according to the present invention include, but are not limitedto, inositol. The alterations in metabolic profile are preferablyaccomplished by expressing one or more of the following nucleic acidsequences (or sequences that hybridize thereto) in a plant: 25124 (SEQID NO:293), 25170 (SEQ ID NO:297), 25176 (SEQ ID NO:298), 25118 (SEQ IDNO:299), 37186 (SEQ ID NO:318), 37188 (SEQ ID NO:319), 45801 (SEQ IDNO:321), 45808 (SEQ ID NO:323), 45820 (SEQ ID NO:324), 45855 (SEQ IDNO:328), and 45864 (SEQ ID NO:329). In preferred embodiments, expressionin plants of the sequences that hybridize to the preceding sequencesalso results in an increase, decrease, or alteration of alcoholproduction in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of alkaloids and other bases in plants. Examples of alkaloidsand other bases that can be altered according to the present inventioninclude, but are not limited to, 1,4-butanediamine. The alterations inmetabolic profile are preferably accomplished by expressing one or moreof the following nucleic acid sequences (or sequences that hybridizethereto) in a plant: 25124 (SEQ ID NO:293), 25164 (SEQ ID NO:296), 25170(SEQ ID NO:297), 27819 (SEQ ID NO:311), and 37186 (SEQ ID NO:318). Inpreferred embodiments, expression in plants of the sequences thathybridize to the preceding sequences also results in an increase,decrease, or alteration of alkaloid and other base production in aplant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of alkenes and alkynes in plants. Examples of alkenes andalkynes that can be altered according to the present invention include,but are not limited to, squalene and limonene. The alterations inmetabolic profile are preferably accomplished by expressing one or moreof the following nucleic acid sequences (or sequences that hybridizethereto) in a plant: 23242 (SEQ ID NO:282), 25124 (SEQ ID NO:293), and25196 (SEQ ID NO:299). In preferred embodiments, expression in plants ofthe sequences that hybridize to the preceding sequences also results inan increase, decrease, or alteration of alkene and alkyne production ina plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of amino acids and related compounds in plants. Examples ofamino acids and related compounds that can be altered according to thepresent invention include, but are not limited to, histidine, leucine,methionine, proline, glycine, alanine, serine, aspartic acid, glutamicacid, lysine, cysteine, tyrosine, phenylalanine, histidine, valine,threonine, arginine, proline, glutamine, tryptophan, isoleucine,5-oxo-proline. The alterations in metabolic profile are preferablyaccomplished by expressing one or more of the following nucleic acidsequences (or sequences that hybridize thereto) in a plant: 105039 (SEQID NO:334), 182206 (SEQ ID NO:280), 170074 (SEQ ID NO:335), 175736 (SEQID NO:336), 21604 (SEQ ID NO:281), 23242 (SEQ ID NO:282), 23869 (SEQ IDNO:283), 25004 (SEQ ID NO:284), 25008 (SEQ ID NO:285), 25015 (SEQ IDNO:288), 25026 (SEQ ID NO:289), 25057 (SEQ ID NO:338), 25080 (SEQ IDNO:337), 25124 (SEQ ID NO:293), 25164 (SEQ ID NO:296), 25170 (SEQ IDNO:297), 25176 (SEQ ID NO:298), 25196 (SEQ]D NO:299), 25425 (SEQ IDNO:301), 25431 (SEQ ID NO:302), 27410 (SEQ ID NO:303), 27424 (SEQ IDNO:304), 27459 (SEQ ID NO:308), 27460 (SEQ ID NO:309), 27468 (SEQ IDNO:310), 27819 (SEQ ID NO:311), 30913 (SEQ ID NO:315), 34442 (SEQ IDNO:317), 37186 (SEQ ID NO:318), 37188 (SEQ ID NO:319), 38919 (SEQ IDNO:320), 45801 (SEQ ID NO:321), 45808 (SEQ ID NO:323), 45820 (SEQ IDNO:324), 45850 (SEQ ID NO:326), 45853 (SEQ ID NO:327), 45855 (SEQ IDNO:328), 45864 (SEQ ID NO:329), and 45866 (SEQ ID NO:330). In preferredembodiments, expression in plants of the sequences that hybridize to thepreceding sequences also results in an increase, decrease, or alterationof amino acid and related compounds production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of carbohydrates in plants. Examples of carbohydrates thatcan be altered according to the present invention include, but are notlimited to, hexose, glucose, fructose, sucrose, galactose, and xylose.The alterations in metabolic profile are preferably accomplished byexpressing one or more of the following nucleic acid sequences (orsequences that hybridize thereto) in a plant: 105039 (SEQ ID NO:334),170074 (SEQ ID NO:335), 23242 (SEQ ID NO:282), 23869 (SEQ ID NO:283),25026 (SEQ ID NO:289), 25124 (SEQ ID NO:293), 25164 (SEQ ID NO:296),25170 (SEQ ID NO:297), 25196 (SEQ ID NO:299), 27430 (SEQ ID NO:306),27819 (SEQ ID NO:311), 27864 (SEQ ID NO:312), 30913 (SEQ ID NO:315),37186 (SEQ ID NO:318), 37188 (SEQ ID NO:319), 45801 (SEQ ID NO:321),45808 (SEQ ID NO:323), 45820 (SEQ ID NO:324), 45855 (SEQ ID NO:328), and45864 (SEQ ID NO:329). In preferred embodiments, expression of thesequences that hybridize to the preceding sequences also results in anincrease, decrease, or alteration of carbohydrate production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of esters in plants. Examples of esters that can be alteredaccording to the present invention include, but are not limited to,2-methyl-, 3-hydroxy-2,4,4-trimethylpentyl-propanoic acid ester. Thealterations in metabolic profile are preferably accomplished byexpressing one or more of the following nucleic acid sequences (orsequences that hybridize thereto) in a plant: 23869 (SEQ ID NO:283). Inpreferred embodiments, expression in plants of the sequences thathybridize to the preceding sequences also results in an increase,decrease, or alteration of ester production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of glycerides in plants. Examples of glycerides that can bealtered according to the present invention include, but are not limitedto, 9,12-octadecadienoic acid (9Z,12Z)-2,3-dihydroxypropyl ester,glycerol palmitate, and glycerol phosphate. The alterations in metabolicprofile are preferably accomplished by expressing one or more of thefollowing nucleic acid sequences (or sequences that hybridize thereto)in a plant: 21604 (SEQ ID NO:281), 23242 (SEQ ID NO:282), 27819 (SEQ IDNO:311), 30913 (SEQ ID NO:315), 45808 (SEQ ID NO:323) and 56465 (SEQ IDNO:333). In preferred embodiments, expression in plants of the sequencesthat hybridize to the preceding sequences also results in an increase,decrease, or alteration of glyceride production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of hydrocarbons in plants. Examples of hydrocarbons that canbe altered according to the present invention include, but are notlimited to, 2-methyl-triacontane and squalene. The alterations inmetabolic profile are preferably accomplished by expressing one or moreof the following nucleic acid sequences (or sequences that hybridizethereto) in a plant: 23242 (SEQ ID NO:282), 25196 (SEQ ID NO:299), 27410(SEQ ID NO:303), 37188 (SEQ ID NO:319), 38919 (SEQ ID NO:320), 45808(SEQ ID NO:323) and 56465 (SEQ ID NO:333). In preferred embodiments,expression in plants of the sequences that hybridize to the precedingsequences also results in an increase, decrease, or alteration ofhydrocarbon production in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of phenols and related compounds in plants. Examples ofphenols and related compounds that can be altered according to thepresent invention include, but are not limited to, quinic acid and4-hydroxy-benzoic acid. The alterations in metabolic profile arepreferably accomplished by expressing one or more of the followingnucleic acid sequences (or sequences that hybridize thereto) in a plant:170074 (SEQ ID NO:335), 23242 (282), 25118 (SEQ ID NO:292), 25176 (SEQID NO:298), 25196 (SEQ ID NO:299), 27819 (SEQ ID NO:311), 30913 (SEQ IDNO:315), 37188 (SEQ ID NO:319), 45801 (SEQ ID NO:321), 45808 (SEQ IDNO:323), 45820 (SEQ ID NO:324), 45855 (SEQ ID NO:328), and 45864 (329).In preferred embodiments, expression in plants of the sequences thathybridize to the preceding sequences also results in an increase,decrease, or alteration of phenol and related compounds in a plant.

In some embodiments, the present invention provides methods andcompositions for increasing, decreasing, or otherwise altering theproduction of sterols, oxygenated terpenes, and other isoprenoids inplants. Examples of sterols, oxygeneated terpenes, and other isoprenoidsthat can be altered according to the present invention include, but arenot limited to, solanesol, cycloartenol, alpha-tocopherol,gamma-tocopherol, alpha-tocopherol quinone, beta-tocopherol,stigmast-7-en-3-ol (3b,5a,24S), cycloartenol, campesterol, cholesterol,beta-sitosterol, stigmasterol, 24-methylene-lophenol,24-methylene-cycloartenol, 4,14-dimethyl-ergosta-8,24(28)-diene-3-ol,obtusifoliol, fucosterol, ergost-22-en-3-one, cycloartenol,stigmasta-5,22-dien-3-ol, and24-methyl-3-oxo-29-norlanostan-10,11,24-methylenecholesterol. Thealterations in metabolic profile are preferably accomplished byexpressing one or more of the following nucleic acid sequences (orsequences that hybridize thereto) in a plant: 105039 (SEQ ID NO:334),21604 (SEQ ID NO:281), 23242 (SEQ ID NO:282), 23869 (SEQ ID NO:282),25124 (SEQ ID NO:293), 25196 (SEQ ID NO:299), 27410 (SEQ ID NO:303),27819 (SEQ ID NO:311), 30913 (SEQ ID NO:315), 37188 (SEQ ID NO:319),38919 (SEQ ID NO:320), 45801 (SEQ ID NO:321), 45808 (SEQ ID NO:323),45864 (SEQ ID NO:329), and 56465 (SEQ ID NO:333). In preferredembodiments, expression in plants of the sequences that hybridize to thepreceding sequences also results in an increase, decrease, or alterationof sterol, oxygenated terpene, and other isoprenoid production in aplant.

In some other preferred embodiments, the present invention comprisesother nucleic acid sequences that either alter fatty acid when expressedin a plant. Sequences that altered fatty acid metabolism were identifiedusing the FAME screen described below (see Example 12B). The FAME screenidentifies the fatty acid composition of plant leaves that have beentransformed with a viral vector comprising the nucleic acid sequences ofthe present invention. Sequences that alter levels of certainmetabolites were identified using the metabolic screen described below(see Example 12E).

In some preferred embodiments, the present invention comprises SEQ IDNO: 94 and variants and orthologs thereof. Plants transformed with aviral vector comprising SEQ ID NO: 94 exhibited a stunted phenotype andhad increased levels of 16:0 fatty acid methyl ester as identified bythe FAME screen (see Example 12B). These plants were further analyzedusing GC-MS (see Example 12E) to generate a metabolic profile. Gaschromatographs of leaf extracts were analyzed to identify compounds thatwere present at an increased level in transformed leaves relative tocontrols. The leaves exhibited increased levels of the following fattyacids: 18:1, 12:0, neophytadience, 14:0, and 16:1. The leaves also hadincreased levels of inositol, phosphoric acid, malic acid, ribonic acid,gamma-lactone, citric acid, quinic acid, and sugars. Furthermore, theplants were resistant to attack by insects (see Example 12C).

In some preferred embodiments, the present invention comprises SEQ IDNO: 43 and variants and orthologs thereof. Plants transformed with aviral vector comprising SEQ ID NO: 43 exhibited a stunted phenotype. Thetransfected plants were analyzed using GC-MS (see Example 12E) togenerate a metabolic profile. Gas chromatographs of leaf extracts wereanalyzed to identify compounds that were present at an increased levelin transformed leaves relative to controls. The plant leaves exhibitedlevels of glyceric acid, malic acid, ribonic acid, gamma-lactone, quinicacid, and inositol.

SEQ ID NO: 43 was compared to known sequences using the BLAST searchprogram. This sequence was found to have homology to the maizeferridoxin:thioredoxin reductase (FTR; See for example, Iwadate et al.,Eur. J. Biochem., 241:121 [1996]). FTR is the essential enzyme of thelight-dependent regulatory system controlling enzyme activities inphotosynthetic plant cells. FTR, in the presence of ferridoxin andthioredoxin, catalyzes the activation of several photosynthetic enzymes,such as fructose-1,6-biphosphatase and NADP-malate dehydrogenase (Seefor example, Tsugita et al., Protein seq Data Anal., 4:9 [1991]; andCrawford et al., Arch Biochem. Biophys, 271:223 [1989]). The presentinvention is not limited to a particular mechanism. Indeed, anunderstanding of the mechanism of the present invention is not necessaryto practice the present invention. Nonetheless, it is contemplated thatdisrupting the regulation of photosynthetic enzymes by disrupting thefunction of FTR, is responsible for the stunting and phenotype andmetabolic profile changes observed in plants transformed with viralvectors comprising SEQ ID NO: 43.

In some preferred embodiments, the present invention comprises SEQ IDNO: 151 and variants and orthologs thereof. Plants transformed with aviral vector comprising SEQ ID NO:151 exhibited a stunted phenotype andwere found to have altered fatty acid metabolism as identified by theFAME screen (see Example 12B). The leaves also exhibited increasedresistance to attack by insects (see Example 12C).

In some preferred embodiments, the present invention comprises SEQ IDNO: 52 and variants and orthologs thereof. Plants transformed with aviral vector comprising SEQ ID NO: 52 exhibited a stunted phenotype. Theplants also exhibited altered fatty acid metabolism as identified by theFAME screen (see Example 12B).

SEQ ID NO: 52 was compared to known sequences using the BLAST searchprogram. The sequence was found to have homology to Arabidopsis thalianapsbW gene (See for example, GenBank Accession Nos. S60662 and X90769).PsbW encodes the W subunit of photosystem II. PsbW is a thylakoidmembrane protein that is part of the core photosystem II complex (Seefor example, Thompson et al., J. Biol. Chem., 273:18979 [1998]; Barberand Kuhlbrandt, Curr Opin Struct Biol 4:469 [1999]). The presentinvention is not limited to a particular mechanism. Indeed, anunderstanding of the mechanism of the present invention is not necessaryto practice the present invention. Nonetheless, it is contemplated thatdisrupting the function of a component of plant photosynthesis, such aspsbW, would disrupt plant growth and lead to the observed stuntingphenotype and disrupted fatty acid metabolism.

In some preferred embodiments, the present invention comprises SEQ IDNO: 49 and variants and orthologs thereof. Plants transformed with aviral vector comprising SEQ ID NO: 49 exhibited a stunted phenotype. Theplants also exhibited altered levels of certain metabolites as evidencedby the metabolic screen.

In some preferred embodiments, the present invention comprises SEQ IDNO: 79 and variants and orthologs thereof. Plants transformed with aviral vector comprising SEQ ID NO: 79 exhibited a stunted phenotype. Theplants were further analyzed using GC-MS (see Example 12E) to generate ametabolic profile. Gas chromatographs of leaf extracts were analyzed toidentify compounds that were present at an increased level intransformed leaves relative to controls. The plant leaves exhibitedincreased levels of malic acid, aspartic acid, pyroglutamate, citricacid, and sucrose.

SEQ ID NO: 79 was compared to known sequences using the BLAST searchprogram. SEQ ID NO: 79 was found to have homology to A. thaliana H3histone gene (See for example, Chaboute et al., Plant Mol. Biol., 8:179[1987]). Histones are structural proteins involved in chromatinstructure. The present invention is not limited to a particularmechanism. Indeed, an understanding of the mechanism of the presentinvention is not necessary to practice the present invention.Nonetheless, it is contemplated that disruption of chromosome structureis responsible for the stunting phenotype observed and for the increasedlevel of certain metabolites.

IX. Identification of Homologs to Sequences

The present invention also provides homologs and variants of thesequences described above, but which may not hybridize to the sequencesdescribed above under conditions ranging from low to high stringency. Insome preferred embodiments, the homologous and variant sequences areoperably linked to an exogenous promoter. Table 1 provides BLAST searchresults from publicly available databases. The relevant sequences areidentified by Accession number in these databases. Table 2 contains thetop blastx hits (identified by accession number) versus all the aminoacid sequences in the Derwent biweekly database. Table 3 contains thetop blastn hits (identified by accession number) versus all thenucleotide sequences in the Derwent biweekly database.

TABLE 1 Blast Search Results for Selected Databases SEQ ID ID NO: NumberBlast results 334 105039 sp|Q43848|TKTC_SOLTU TRANSKETOLASE, CHLOROPLASTPRECURSOR (TK) emb|CAA90427.1| (Z50099) transketolase precursor [Solanumtuberosum] 335 170474 gb|AAK27804.1|AC022457_7 (AC022457) hypotheticalprotein [Oryza sativa] 336 175736 pir∥T14544 fructokinase (EC 2.7.1.4) -beet gb|AAA80675.1| (U37838) fructokinase [Beta vulgaris] 280 182206pir∥S58083 transketolase (EC 2.2.1.1) precursor - potato (fragment) 28121604 pir∥T45745 hypothetical protein F24M12.180 - Arabidopsis thalianaemb|CAB62636.1| (AL132980) putative protein [Arabidopsis thaliana] 28223242 gb|AAC73034.1| (AC005824) hypothetical protein [Arabidopsisthaliana] 283 23869 pir∥S76514 hypothetical protein - Synechocystis sp.(strain PCC 6803) dbj|BAA10360.1| (D64002) hypothetical protein[Synechocystis sp.] 284 25004 sp|O80934|Y230_ARATH PROTEIN AT2G37520,CHLOROPLAST PRECURSOR pir∥T02532 hypothetical protein F13M22.16 -Arabidopsis thaliana gb|AAC23636.1| (AC004684) unknown protein[Arabidopsis thaliana] 285 25008 sp|O80934|Y230_ARATH PROTEIN AT2G37520,CHLOROPLAST PRECURSOR pir∥T02532 hypothetical protein F13M22.16 -Arabidopsis thaliana gb|AAC23636.1| (AC004684) unknown protein[Arabidopsis thaliana] 286 25009 sp|P55748|CP22_HORVU SERINECARBOXYPEPTIDASE II-2 PRECURSOR (CP-MII.2) gb|AAB31590.1| CP-MII.2 =serine carboxypeptidase [Hordeum vulgare = barley, cv. Alexis, aleurone,Peptide, 436 aa] emb|CAB59202.1| (X78878) serine carboxylase II-2[Hordeum vulgare] 287 25011 sp|P74707|RF1_SYNY3 PEPTIDE CHAIN RELEASEFACTOR 1 (RF-1) pir∥S76914 translation releasing factor RF-1 -Synechocystis sp. (strain PCC 6803) dbj|BAA18826.1| (D90917) peptidechain release factor [Synechocystis sp.] 288 25015gb|AAG40037.1|AF324686_1 (AF324686) MSA6. [Arabidopsis thaliana]gb|AAG41441.1|AF326859_1 (AF326859) unknown protein [Arabidopsisthaliana] dbj|BAB01448.1| (AP000604) photosystem II 5 kD proteinprecursor [Arabidopsis thaliana] gb|AAK00364.1|AF339682_1 (AF339682)unknown protein [Arabidopsis thaliana] 289 25026 sp|P27521|CB24_ARATHCHLOROPHYLL A-B BINDING PROTEIN 4 PRECURSOR (LHCI TYPE III CAB-4) (LHCP)pir∥T45707 CHLOROPHYLL A-B BINDING PROTEIN 4 PRECURSOR homolog -Arabidopsis thaliana gb|AAA32760.1| (M63931) light-harvestingchlorophyll a/b binding protein [Arabidopsis thaliana] emb|CAB61973.1|(AL132955) CHLOROPHYLL A-B BINDING PROTEIN 4 PRECURSOR homolog[Arabidopsis thaliana] 338 25057 gb|AAD25930.1|AF085279_3 (AF085279)hypothetical Cys-3-His zinc finger protein [Arabidopsis thaliana]gb|AAF18728.1|AC018721_3 (AC018721) putative CCCH-type zinc fingerprotein [Arabidopsis thaliana] 290 25062 ref|XP_011617.2| golgitransport complex 1 (90 kDa subunit) [Homo sapiens] 337 25080gb|AAK52899.1|AF351125_1 (AF351125) gamma-aminobutyrate transaminasesubunit precursor [Arabidopsis thaliana] 291 25104 sp|P15459|2SS3_ARATH2S SEED STORAGE PROTEIN 3 PRECURSOR (2S ALBUMIN STORAGE PROTEIN)pir∥NWMU3 2S albumin 3 precursor - Arabidopsis thaliana gb|AAA32745.1|(M22033) albumin 2S subunit 3 precursor [Arabidopsis thaliana]emb|CAA80868.1| (Z24744) 2S albumin isoform 3 [Arabidopsis thaliana]emb|CAB38846.1| (AL035680) NWMU3-2S albumin 3 precursor [Arabidopsisthaliana] emb|CAB79571.1| (AL161566) NWMU3-2S albumin 3 precursor[Arabidopsis thaliana] 292 25118 gb|AAG50838.1|AC073944_5 (AC073944)multispanning membrane protein, putative [Arabidopsis thaliana] 29325124 gb|AAK25908.1|AF360198_1 (AF360198) putative ubiquitin carboxyl-terminal hydrolase [Arabidopsis thaliana] 294 25133gb|AAF78265.1|AC020576_9 (AC020576) Contains similarity to aminoacylasefrom Sus scrofa domestica gi|S27010 and contains a peptidase M20PF|01546 domain. ESTs gb|H76043, gb|AA394953, gb|AI995115, gb|AA651481come from this gene. [Arabidopsis thaliana] 295 25144 dbj|BAB01489.1|(AB030033) AmiB [Dictyostelium discoideum] 296 25164 dbj|BAB02817.1|(AB024036) gene_id: MQC12.11~unknown protein [Arabidopsis thaliana] 29725170 gb|AAF78388.1|AC069551_21 (AC069551) T10O22.12 [Arabidopsisthaliana] 298 25176 emb|CAB44317.1| (Y17842) lamin B receptor [Xenopuslaevis] 299 25196 emb|CAC08341.1| (AL392174) lipoic acid synthase-likeprotein [Arabidopsis thaliana] 339 25414 emb|CAB52750.1| (AJ245632)photosystem I subunit VI precursor [Arabidopsis thaliana]gb|AAF29410.1|AC022354_9 (AC022354) photosystem I subunit VI precursor[Arabidopsis thaliana] 300 25421 gb|AAD40603.1|AF115283_22 (AF115283)preprotein translocase SecY [Leptospira interrogans] 301 25425gb|AAC32114.1| (AF051209) CROC-1-like protein [Picea mariana] 302 25431gb|AAF86550.1|AC069252_9 (AC069252) F2E2.14 [Arabidopsis thaliana] 30327410 sp|P24636|TBB4_ARATH TUBULIN BETA-4 CHAIN pir∥S68122 tubulinbeta-4 chain - Arabidopsis thaliana gb|AAA32757. 1| (M21415)beta-tubulin [Arabidopsis thaliana] 304 27424 pir∥T01527 hypotheticalprotein A_IG005I10.23 - Arabidopsis thaliana gb|AAB62841.1| (AF013293)A_IG005I10.23 gene product [Arabidopsis thaliana]gb|AAF02799.1|AF195115_19 (AF195115) F5I10.23 gene product [Arabidopsisthaliana] emb|CAB80802.1| (AL161471) putative protein [Arabidopsisthaliana] 305 25427 pir∥T49105 symbiosis-related like protein -Arabidopsis thaliana emb|CAA18101.1| (AL022140) symbiosis-related likeprotein [Arabidopsis thaliana] emb|CAB79153.1| (AL161556)symbiosis-related like protein [Arabidopsis thaliana] 306 27430pir∥T24470 hypothetical protein T04F8.8 - Caenorhabditis elegansemb|CAA91483.1| (Z66565) cDNA EST yk121f1.5 comes from this gene~cDNAEST yk145f11.3 comes from this gene~cDNA EST yk150b6.3 comes from thisgene~cDNA EST yk150b6.5 comes from this gene~cDNA EST yk171h2.5 comesfrom this gene~cDNA EST yk205f7.3 comes from this gene 307 27440gb|AAB03512.1| (L37749) hexokinase III [Homo sapiens] 308 27459gb|AAF87848.1|AC073942_2 (AC073942) Contains similarity to ahypothetical protein T11I11.11 gi|6587865 from Arabidopsis thaliana BACgb|AC012680 309 27460 sp|Q9ZAE3|RL18_THEMA 50S RIBOSOMAL PROTEIN L18pir∥D72248 ribosomal protein L18 - Thermotoga maritima (strain MSB8)gb|AAD36550.1|AE001798_15 (AE001798) ribosomal protein L18 [Thermotogamaritima] 310 27468 gb|AAF97342.1|AC023628_23 (AC023628) Putative MYBfamily transcription factor [Arabidopsis thaliana] 311 27819 pir∥S56707histone H3 homolog - common tobacco 312 27864 pir∥T02580 hypotheticalprotein T16B24.14 - Arabidopsis thaliana gb|AAC28986.1| (AC004697)putative patatin protein [Arabidopsis thaliana] 313 30087 gb|AAC72288.1|(AF033204) putative pectin methylesterase [Arabidopsis thaliana] 31430307 ref|NP_064380.1| phosphorylated adaptor for RNA export [Musmusculus] emb|CAB87994.1| (AJ276504) phosphorylated adaptor for RNAexport [Mus musculus] 315 30913 gb|AAA73163.1| (M81126) synthetic fusionprotein [synthetic construct] 316 34136 gb|AAB82617.1| (AC002387)unknown protein [Arabidopsis thaliana] 317 34442 pir∥T48166 hypotheticalprotein T10O8.150 - Arabidopsis thaliana emb|CAB81927.1| (AL161746)putative protein [Arabidopsis thaliana] 318 37186 pir∥T09015transketolase (EC 2.2.1.1) precursor, chloroplast - spinachgb|AAD10219.1| (L76554) transketolase [Spinacia oleracea] 319 37188pir∥RGUS1M exonuclease REC1 (EC 3.1.- .- ) - smut fungus (Ustilagomaydis) 320 38919 pir∥UQFS ubiquitin precursor - common sunflower(fragment) 321 45801 dbj|BAA97024.1| (AB024035) 30S ribosomal proteinS16 [Arabidopsis thaliana] 322 45804 emb|CAA05084.1| (AJ001911) putativeCkc2 [Arabidopsis thaliana] 323 45808 gb|AAF80126.1|AC024174_8(AC024174) Contains similarity to a fructokinase from Solanum tuberosumgi|585973 and is a member of the pfkB carbohydrate kinase familyPF|00294. [Arabidopsis thaliana] 324 45820 sp|Q43291|RL21_ARATH 60SRIBOSOMAL PROTEIN L21 gb|AAB60725.1| (AC000132) Similar to ribosomalprotein L21 (gb|L38826). ESTs gb|AA395597, gb|ATTS5197 come from thisgene. [Arabidopsis thaliana]gb|AAC33220.1|AAC33220 (AC003970) Putativeribosomal protein L21 [Arabidopsis thaliana] gb|AAK44042.1|AF370227_1(AF370227) putative ribosomal protein L21 [Arabidopsis thaliana] 32545837 dbj|BAB02573.1| (AP001299) gene_id: F4B12.10~unknown protein[Arabidopsis thaliana] 326 45850 gb|AAG51729.1|AC068667_8 (AC068667)unknown protein; 16040-11188 [Arabidopsis thaliana] 327 45853sp|P92792|OM20_SOLTU MITOCHONDRIAL IMPORT RECEPTOR SUBUNIT TOM20(TRANSLOCASE OF OUTER MEMBRANE 20 KDA SUBUNIT) pir∥T07679 protein importreceptor TOM20, mitochondrial potato emb|CAA63223.1| (X92491) TOM20[Solanum tuberosum] 328 45855 gb|AAF99832.1|AC008046_4 (AC008046)Putative ribosomal protein [Arabidopsis thaliana]gb|AAK48976.1|AF370549_1 (AF370549) Putative ribosomal protein[Arabidopsis thaliana] 329 45864 pir∥T05075 hypothetical proteinT6K21.70 - Arabidopsis thaliana emb|CAA17132.1| (AL021889) putativeprotein [Arabidopsis thaliana] emb|CAB78791.1| (AL161547) putativeprotein [Arabidopsis thaliana] 330 45866 sp|Q9ZSW9|TCTP_HEVBRTRANSLATIONALLY CONTROLLED TUMOR PROTEIN HOMOLOG (TCTP) gb|AAD10032.1|(AF091455) translationally controlled tumor protein [Hevea brasiliensis]331 45869 dbj|BAB01276.1| (AB023046) proline-rich protein APG-like;GDSL- motif lipase/hydrolase-like protein [Arabidopsis thaliana] 33245874 pir∥T51279 ribosomal protein, chloroplast - Arabidopsis thalianaemb|CAC00754.1| (AL390921) ribosomal protein, chloroplast [Arabidopsisthaliana] 333 56465 pir∥T47886 transketolase-like protein - Arabidopsisthaliana emb|CAB82679.1| (AL162295) transketolase-like protein[Arabidopsis thaliana]

TABLE 2 SEQ ID NO. ID Number Accession Numbers of Hits 334 105039AAB10624; AAW03319; AAG40462; AAG31692; AAG31693; AAB10624; AAW03319;AAG31691; AAG28890; AAG15242 335 170474 AAG12800; AAG12801; AAG29067;AAG12802; AAG52408; AAG12802; AAG12800; AAG12801; AAG52414; AAG52416 336175736 AAW81786; AAG17112; AAG09068; AAB46419; AAB46419; AAG17114;AAG09070; AAG17112; AAG09068; AAG17113 280 182206 AAG08581; AAG15243;AAG15242; AAG15244; AAG40461 281 21604 AAG60253; AAG57855; AAG60254;AAG57856; AAG58553 283 23869 AAG58058; AAG58083; AAG58059; AAG58059 28525008 AAG37867; AAG18287; AAG37866; AAG18286 286 25009 AAG47179;AAG47178; AAG47177; AAG23906; AAG23905 287 25011 AAG29027; AAG29026;AAG29028; AAY70144; AAW29380 288 25015 AAG43616; AAG43614; AAG43615;AAG14253; AAG14254 289 25026 AAG53842; AAG08988; AAG53841; AAG08987;AAG53843 338 25057 AAG31643; AAG31641; AAG31642; AAG61071; AAG61072 29025062 AAR89750 337 25080 AAB19490; AAG16448; AAG16449; AAG16450;AAG33444; AAB19490; AAG16448; AAG16450; AAG16449; AAG35991 291 25104AAP96144; AAP91892; AAW23588; AAW23586; AAR33390 292 25118 AAG43743;AAG43745; AAG43744; AAG04118; AAG31959 293 25124 AAY86227; AAY35651;AAY37290 294 25133 AAG50167; AAG50168; AAG50166; AAG06513; AAG06511;AAG50167; AAG06512; AAG50166; AAG06511; AAG33534 295 25144 AAG57079;AAB16305; AAR39299; AAB16306; AAB18198; 296 25164 AAG39753; AAG29096;AAG39752; AAG29095 297 25170 AAG38679; AAG38680; AAG27620; AAG38681;AAG27621 298 25176 AAB63719; AAR93610; AAR97834; AAG07743 299 25196AAG48727; AAG05244; AAG48728; AAG05245; AAG48729 339 25414 AAG10794;AAG47347; AAG54914; AAG47312; AAG10974; AAG54914; AAG47312; AAG10974 30025421 AAG31569; AAY95040 301 25425 AAG39142; AAG39143; AAG08248;AAG54876; AAG39144 302 25431 AAG16710; AAG16709; AAG16711; AAG47573;AAG41227 303 27410 AAG47010; AAG47008; AAG47009; AAG42988; AAG17036 30427424 AAG59539; AAG59538; AAG59537; AAG22267 305 25427 AAG47987;AAG47990; AAG47988 306 27430 AAG44675; AAG44673; AAG44674; AAG51796;AAG29999 307 27440 AAW75919; AAG41126; AAW13670; AAY65872 308 27459AAG21756; AAG21757; AAG21755; AAG34822; AAG34823 309 27460 AAG52634;AAG09437; AAG52635; AAG09438; AAG09943 310 27468 AAG14730; AAG14731;AAG19773; AAG19774 311 27819 AAG12083; AAG47929; AAG36934; AAG26859;AAG36935 312 27864 AAG54013; AAG54014; AAG54015; AAG05196; AAG05197 31330087 AAG14746; AAG14744; AAG14745; AAW72963; AAW12660 314 30307AAY87788; AAY78855; AAY34917; AAY93184 315 30913 AAB04162; AAR10474;AAR21552; AAR21553 316 34136 AAG43687; AAG43685; AAG43686 317 34442AAG31677; AAG31678; AAG31679; AAW95039; AAY35611 318 37186 AAG31691;AAG40462; AAG40461; AAG31692; AAG40460 319 37188 AAB47020; AAG44658;AAG44659; AAG44660 320 38919 AAG44760 321 45801 AAG24483; AAG20434;AAG39213; AAG39214; AAG24484 322 45804 AAG49622; AAG07101; AAG49621;AAG07100 323 45808 AAG10313; AAG10314; AAG10315; AAG17114; AAG09070 32445820 AAG24999; AAG25000; AAG43611; AAG47669; AAG27986 325 45837AAG26406; AAG26407; AAY71916 327 45853 AAG51530; AAG08042; AAG51531;AAG08043; AAG11383 328 45855 AAG25047; AAG23219; AAG11189; AAG09821;AAG09820 329 45864 AAG25072; AAG25071; AAG25070; AAG52744; AAG52745 33045866 AAG41955; AAG04532; AAG35568; AAG54578; AAG54502 331 45869AAG45664; AAG26188; AAG26186; AAG26187; AAG45663 332 45874 AAG26336;AAG19244; AAG13376; AAG54975; AAG16267 333 56465 AAG09847; AAG09848;AAG08581

TABLE 3 SEQ ID NO: ID Number Nucleic Acid Blast Hits 334 105039AAA71793; AAT35903; AAC46455; AAC43120; AAC36886; AAA71793; AAT35903;AAF61220; AAF61219; AAF61218 335 170474 AAC35963; AAC50920; AAC50918;AAC42162; AAC35963; AAV64625; AAC37608; AAC34496; AAF66834 336 175736AAA67291; AAA67287; AAA67285; AAA67288; AAA67278; AAA67275; AAZ43885;AAZ43880 280 182206 AAA71793; AAT35903; AAC34305 281 21604 AAC54480;AAC53467; AAZ86878; AAZ40799; AAV15082 282 23242 AAC88116; AAC78052;AAC77093; AAX90201 283 23869 AAC53550; AAC53560; AAC54140; AAC52688 28525008 AAC45476; AAC38055; AAC46141; AAC43179; AAC35309; 286 25009AAC48966; AAC40187; AAZ46156; AAC47963; AAV57911 287 25011 AAC42148;AAC79885; AAC53894; AAC32896; AAF28551 288 25015 AAC47630; AAC36501;AAC51267; AAC46968; AAA27678 289 25026 AAC51455; AAC34464; AAC46521;AAC51766; AAX13230 338 25057 AAC77474; AAX41515; AAC50747; AAC33928;AAX91990; AAA47316; AAF32205; AAA81575; AAF15915 290 25062 AAT34660;AAC79951; AAC64370; AAX91990; AAX99575 337 25080 AAC37361; AAC62026;AAC43789; AAZ28374; AAC62028; AAC37361; AAC62026; AAC44199 291 25104AAN90116; AAN91903 292 25118 AAC43220; AAC32617; AAC47678 293 25124AAF09844; AAV68142; AAX20248; AAT70813; AAT17981 294 25133 AAC50085;AAC33510; AAC79623; AAT74200; AAC50085; AAC33510; AAC43825; AAC03410;AAQ33106 295 25144 AAC47787; AAC48287; AAC53139; AAC40672 296 25164AAC46192; AAC42171; AAC42036; AAC53237 297 25170 AAC49111; AAC39852;AAC43994; AAC49101; AAT42063 298 25176 AAC48552; AAC46307; AAC45561;AAC35280; AAC33374 299 25196 AAC49548; AAC33043; AAC44589 339 25414AAC35174; AAC49030; AAC49017; AAC52025; AAF68242; AAC52025; AAC49017;AAC51815 300 25421 AAF76597; AAF45152; AAF45151; AAF45144 301 25425AAC34171; AAC45956; AAC51987; AAC35101 302 25431 AAC37460; AAZ35273;AAC44705; AAA93919; AAX20248 303 27410 AAC48904; AAC47026; AAC40829;AAC47401; AAC37579 304 27424 AAC54087; AAC39572; AAC41582; AAC54192;AAA46327 305 25427 AAC49271; AAC36469; AAC49272; AAC34314; AAF22305 30627430 AAC47114; AAC49141; AAC49134 307 27440 AAX20560; AAF28099;AAF26320; AAA34915; 34914 308 27459 AAX99592; AAX23321; AAC84836;AAC84826; AAC89559 309 27460 AAC51007; AAC34646; AAC34838; AAC36205 31027468 AAC36683; AAX90605; AAV25602 311 27819 AAC41085; AAC47648;AAC36131; AAC43614; AAC34045 312 27864 AAF24298; AAT14180; AAF67765;AAC08621 313 30087 AAC36689; AAV64073; AAT51738; AAV64074; AAT51739 31430307 AAX91990; AAV57903; AAT22884; AAC44294; AAC38980 315 30913AAF24901; AAA56091 316 34136 AAC40341; AAC47656; AAC38809; AAQ98471;AAC38809; AAC47656; AAC55902; AAC74238; AAX13009 317 34442 AAT12557;AAT34617; AAA44222; AAC59611; AAF65936 318 37186 AAC46455; AAC43120;AAC36886; AAC34801; AC34305; 319 37188 AAC85282; AAC79694; AAC83002;AAX76329 320 38919 AAF22305 321 45801 AAC40401; AAC38898 322 45804AAC49881; AAC33732; AAC54301; AAC56824 323 45808 AAC34977; AAV64626;AAC82994; AAC34496 324 45820 AAC40597; AAC37179; AAC47629; AAC41759;AAC49148 325 45837 AAC41154; AAF75866; AAF75865; AAC76913; AAF22305 32645850 AAV80117; AAV80181 327 45853 AAC50591; AAC34094; AAC52632;AAC35406 328 45855 AAC40616; AAC52091; AAC35334; AAC39935 329 45864AAC40625; AAC51049; AAC44350 330 45866 AAC47019; AAC32771; AAC53835;AAC44612; AAC51854 331 45869 AAC48403; AAC41073; AAC49977; AAC52157;AAA35030 332 45874 AAC51906; AAC41128; AAC38438 333 56465 AAC36886;AAC34305

In addition, the nucleic acid sequences were analyzed using Pfamanalysis. The nucleic acid sequences giving stunting phenotypes werefurther analyzed by translating the sequences into their predicted aminoacid sequences. The predicted amino acid sequences were used to searchamino acid databases to identify protein homologs and orthologs. Oneskilled in the art recognizes that a nucleic acid sequence can betranslated in one of three reading frames. However, all of the aminoacid sequences identified herein have substantial homology to thehomolog or ortholog. Therefore, it is contemplated that in someembodiments the amino acid sequences presented herein are translated inthe correct reading frame.

A. Fructokinase

In some embodiments, the present invention comprises sequencesidentified as having homology to fructokinase. Table 4 shows putativefructokinase enzymes, identified through Pfam searches and searches forprotein orthologs and homologs. Table 4 lists sequence identificationfor the nucleic acid sequence, the Pfam score and P-value, and sequenceidentification for protein orthologs and homologs for all sequencesidentified as fructokinases.

Fructokinase is involved in the metabolism of simple sugars in plants.Sucrose translocated from leaves to sink tissue may be stored directlyor metabolized by Sucrose synthase and/or invertase to provide hexoseand hexose phosphate for storage or metabolism. In both Sucrosesynthase- and invertase-mediated metabolic pathways, Fructose is formedas a metabolic product and must be phosphorylated for furthermetabolism. Two enzymes, hexokinase and fructokinase are able tophosphorylate Fructose in plants. Hexokinase can effectively utilizeseveral hexoses, including Fructose and Glucose, whereas fructokinasespecifically phosphorylates fructose. Fructokinase is likely to be ofprimary importance in phosphorylation of fructose in plants because theaffinity of fructokinase for fructose is much higher than that ofhexokinase.

TABLE 4 Fructokinase Sequences Ortholog/Homolog SEQ ID NO PFam ScoreP-Value SEQ ID NO 101 17.9 0.00011 167877; 123 105 26.7 1.8e−07 187756;167877 120 139.9 5.9e−43 123; 190868; 130; 167877 122 127 6.4e−39 123;190868; 130 123 143.5 4.4e−44 190868 126 38.8   3e−11 186915 128 16.80.000240 186915; 167955; 105 130 94.6 9.1e−29 190868 133 34.3 7.7e−10190868 134 34.3 7.7e−10 133 137 94.9 7.4e−29 190868; 130 149 97.1 97.1190868; 130; 19086; 123

B. Transketolase

In some embodiments, the present invention comprises sequencesidentified as having homology to transketolase. Table 5 shows putativefructokinase enzymes, identified through Pfam searches and searches forprotein orthologs and homologs. Table 5 lists sequence identificationfor the nucleic acid sequence, the Pfam score and P-value, and sequenceidentification for protein orthologs and homologs for all sequencesidentified as having homology to transketolases.

TABLE 5 Transketolase Sequences Ortholog/Homolog SEQ ID NO PFam ScoreP-Value SEQ ID NO 81 30.6 2.7e−7  124; 130065 85 235.9 5.8e−67 144; 140118 113.8 3.3e−30 30310; 144 119 19.9 0.00018 129 121 162.7   6e−45 146;145; 143 124 150.8 2.4e−41 140; 85 125 347.8  1.2e−100 182731; 148; 141127 125.3 1.1e−33 129; 97 129 189.3 5.9e−53 97 131 193.8 2.7e−54 129; 97132 174.7 1.6e−48 85; 144; 140 135 291.7   9e−84 130; 190868; 30310;140; 144; 123 138 159.1 7.7e−44 129; 97 139 159.1 7.7e−44 129; 97 140232 8.5e−66 85; 144 141 335.6 5.6e−97 148; 143 142 81 1.1e−20 145 143323.6 2.3e−93 148; 141; 143 144 191.4 1.4e−53 85; 140 145 225.2 9.5e−64141; 143; 148 146 228.9 7.1e−65 143; 141 147 220 3.6e−62 182731; 140;124 148 352.5  4.5e−102 182731

Transketolase is a key enzyme of the non-oxidative pentose phosphatepathway. The effect of its overexpression on aromatic amino acidproduction was investigated in Corynebacterium glutamicum, a typicalamino-acid-producing organism. For this purpose, the transketolase geneof the organism was cloned on the basis of its ability to complement aC. glutamicum transketolase mutant with pleiotropicallyshikimic-acid-requiring, ribose- and gluconic-acid-negative phenotype. AcDNA encoding the Calvin cycle enzyme transketolase was isolated fromSorghum bicolor via subtractive differential hybridization, and used toisolate several full-length cDNA clones for this enzyme from spinach.Functional identity of the encoded mature subunit was shown by an8.6-fold increase of TKL activity upon induction of Escherichia colicells that overexpress the spinach TKL subunit under the control of thebacteriophage T7 promoter. Chloroplast localization of the cloned enzymeis shown by processing of the in vitro synthesized precursor upon uptakeby isolated chloroplasts. Southern blot-analysis suggests that TKL isencoded by a single gene in the spinach genome. TKL proteins of bothhigher-plant chloroplasts and the cytosol of non-photosyntheticeukaryotes are found to be unexpectedly similar to eubacterialhomologues, suggesting a possible eubacterial origin of these nucleargenes. Chloroplast TKL is the last of the demonstrablychloroplast-localized Calvin cycle enzymes to have been cloned and thuscompletes the isolation of gene probes for all enzymes of the pathway inhigher plants.

C. Ferritin

In some embodiments, the present invention comprises sequencesidentified as having homology to ferritin. Table 6 shows putativeferritin proteins, identified through Pfam searches and searches forprotein orthologs and homologs. Table 6 lists sequence identificationfor the nucleic acid sequence, the Pfam score and P-value, and sequenceidentification for protein orthologs and homologs for all sequencesidentified as having homology to ferritin.

Iron-regulated ferritin synthesis in animals is dominated bytranslational control of stored mRNA; iron-induced transcription offerritin genes, when it occurs, changes the subunit composition offerritin mRNA and protein and is coupled to translational control.Ferritins in plants and animals have evolved from a common progenitor,based on the similarity of protein sequence; however, sequencedivergence occurs in the C termini; structure prediction suggests thatplant ferritin has the E-helix, which, in horse ferritin, forms a largechannel at the tetrameric interface. In contemporary plants, a transitpeptide is encoded by ferritin mRNA to target the protein to plastids.Iron-regulated synthesis of ferritin in plants and animals appears to bevery different since the 50- to 60-fold increases of ferritin protein,previously observed to be induced by iron in cultured soybean cells, isaccompanied by an equivalent accumulation of hybridizable ferritin mRNAand by increased transcription of ferritin genes. Ferritin mRNA fromiron-induced cells and the constitutive ferritin mRNA from soybeanhypocotyls are identical. The iron-induced protein is translocatednormally to plastids. Differences in animal ferritin structure coincidewith the various iron storage functions (reserve for iron proteins anddetoxification). In contrast, the constancy of structure of soybeanferritin, iron-induced and constitutive, coupled with the potential forvacuolar storage of excess iron in plants suggest that rapid synthesisof ferritin from a stored ferritin mRNA may not be needed in plants fordetoxification of iron.

A synthetic siderophore, O-Trensox (L), has been designed andsynthesized to improve iron nutrition of plants. The affinity for ironof this ligand [pFe(III)=29.5 and pFe(II)=17.9] is very high comparedwith EDTA. In spite of its high and specific affinity for iron,O-Trensox was found to be able to prevent, and to reverse, ironchlorosis in several plant species grown in axenic conditions. It alsoallows the iron nutrition and growth of Acer pseudoplatanus L. cellsuspensions. The rate of iron metabolization was monitored by ⁵⁹Feradioiron. Ferritins the iron storage proteins, are shown to be thefirst iron-labelled proteins during iron metabolization and to be ableto further dispatch the metal. Using Fe(III-Trensox, the rate of ironincorporation into ferritin was found to be higher than when usingFe-EDTA, but slower than with Fe-citrate, the natural iron carrier inxylem. During a plant cell culture, the extracellular concentrations ofiron complex and fiee ligand were measured; changes in their relativeamounts showed that the iron complex is dissociated extracellularly andthat only iron is internalized. This suggests a high affinity for ironof a putative carrier on the plasmalemma. In contrast with Fe-citrateand Fe-EDTA complexes, Fe(III)Trensox is not photoreducible. Its abilityto induce radical damage as a Fenton reagent was tested usingsupercoiled DNA as target molecule. Unlike Fe-citrate and Fe-EDTA,Fe(II)Trensox and Fe(III)-Trensox were proven to be harmless even duringascorbate-driven reduction, while Fe-EDTA and Fe-citrate generate heavydamage to DNA.

TABLE 6 Ferritin Sequences Ortholog/ Homolog SEQ ID NO PFam ScoreP-Value SEQ ID NO 33 66.1 1.2e−18 34 66.1 1.2e−18 33; 25152X. Contigs and Orthologs Identified by Sequence Analysis

In some embodiments, the present invention comprises nucleic acidsequences (contigs) assembled from SEQ ID NOs:1-154 (See FIG. 1)described above. Contigs were assembled by a computer program configuredto match sequences with at least a 50 nucleotide overlap with at least93% exact homology. The sequences for these contigs are provided in FIG.2.

Contig 1 (Sequence ID NO: 155) (633 nucleotides) was assembled from SEQID NO: 120 and SEQ ID NO: 101. Sequence analysis of contig 1 revealedhomology to a Fructokinase (GenBank accession NO: T07588).

Contig 2 (Sequence ID NO: 156) (1127 nucleotides) was assembled from SEQID NO: 132, SEQ ID NO: 131, SEQ ID NO: 139 and SEQ ID NO: 138. Sequenceanalysis of contig 2 revealed homology to a Chloroplast Transketolase(NO: Q42676).

Contig 3 (Sequence ID NO: 157) (1991 nucleotides) was assembled from SEQID NO: 132, SEQ ID NO: 145, SEQ ID NO: 146, SEQ ID NO: 141, SEQ ID NO:143, SEQ ID NO: 148, SEQ ID NO: 147, SEQ ID NO: 140, SEQ ID NO: 135 andSEQ ID NO: 144. Sequence analysis of contig 3 revealed homology to aChloroplast Transketolase Precursor (GenBank accession NO: Q43848).

Contig 4 (Sequence ID NO: 158) (607 nucleotides) was assembled from SEQID NO: 133 and SEQ ID NO: 134. Sequence analysis of contig 4 revealedhomology to a Guanosine Kinase (GenBank accession NO: BAA23613).

Contig 5 (Sequence ID NO:159) (452 nucleotides) was assembled from SEQID NO: 7 and SEQ ID NO: 6. Sequence analysis of contig 5 revealshomology to a hypothetical protein from A. thaliana (GenBank accessionNO: T02532).

Contig 6 (Sequence ID NO: 160) (391 nucleotides) was assembled from SEQID NO: 9 and SEQ ID NO: 15. Sequence analysis of contig 6 revealshomology to a Translation Releasing Factor, RF-1 like protein from A.thaliana (GenBank accession NO: CAB87736).

Contig 8 (Sequence ID NO:162) (800 nucleotides) was assembled from SEQID NO: 33 and SEQ ID NO: 34. Sequence analysis of contig 08 revealshomology to a putative Ferritin Subunit Precursor (GenBank accession NO:AC009991).

Contig 10 (Seq ID NO: 164) (771 nucleotides) was assembled from SEQ IDNO: 22 and SEQ ID NO: 13. Sequence analysis of contig 10 revealshomology to a hypothetical A. thaliana protein (GenBank accession NO:T04685).

Contig 12 (Sequence ID NO: 165) (633 nucleotides) was assembled from SEQID NO: 130 and SEQ ID NO: 149. Sequence analysis of contig 12 revealshomology to a Fructokinase from A. thaliana (GenBank accession NO:T01971).

Contig 13 (Sequence ID NO: 166) (581 nucleotides) was assembled from SEQID NO: 50 and SEQ ID NO: 45. Sequence analysis of contig 13 revealshomology to an ATP11a Peroxidase from A. thaliana (GenB ank accessionNO: CAA67334).

Contig 14 (Sequence ID NO: 167) (701 nucleotides) was assembled from SEQID NO: 53 and SEQ ID NO: 43. Sequence analysis of contig 14 revealshomology to a Ferridoxin-Thioredoxin Reductase Subunit A from Zea maize(GenBank accession NO: P80680).

Contig 15 (Sequence ID NO: 168) (693 nucleotides) was assembled from SEQID NO: 129 and SEQ ID NO: 127. Sequence analysis of contig 15 revealshomology to a Chloroplast Transketolase Precursor (GenBank accession NO:Q43848).

Contig 16 (Sequence ID NO: 169) (504 nucleotides) was assembled from twocopies of Seq ID NO: 67. Sequence analysis of contig 16 reveals homologyto a hypothetical protein, F17K2.25, from A. thaliana (GenBank accessionNO: T02475).

Contig 17 (Sequence ID NO: 170) (1626 nucleotides) was assembled fromSEQ ID NO: 121, SEQ ID NO: 125, SEQ ID NO: 81 and SEQ ID NO: 124.Sequence analysis of contig 17 reveals homology to Chloroplast PrecursorTransketolase (GenBank accession NO: Q43848).

Contig 20 (Sequence ID NO: 171) (649 nucleotides) was assembled from SEQID NO: 122 and SEQ ID NO: 123. Sequence analysis of contig 20 revealshomology to a Fructokinase from L. esculentum (GenBank accession NO:AAB51108).

In other embodiments, the present invention comprises nucleic acidscorresponding to contigs developed by searching a nucleic acid database(this database contains other sequences developed for the screeningprograms as described in the Examples) for sequences having homology atthe amino acid level. These sequences were then assembled into contigsbased on sequence overlaps and a consensus nucleic acid sequence wasdeveloped. These contigs are provided in FIGS. 3 and 4 and correspond toSEQ ID NOs.: 172-216.

In still further embodiments, the present invention provides orthologsidentified by searching the nucleic acid database for homologoussequences. These orthologs, SEQ ID NOs: 217-280, are provided in FIG. 5.

In addition, a blast search of the sequence database at a stringency ofe-20 was conducted with SEQ ID NOs:281-343. This led to theidentification of the contig and singleton sequences listed in FIG. 9,SEQ ID NOs:344-571.

As will be understood by those skilled in the art, the present inventionis not limited to the particular sequences of the contigs and orthologsdescribed above. Indeed, the present invention encompasses portions,fragments, and variants of the contigs and orthologs as described above.Such variants, portions, and fragments can be produced and identified asdescribed in Section III above. In particularly preferred embodiments,the present invention provides sequences that hybridize to SEQ ID NOs:155-280 and 344-571 under conditions ranging from low to highstringency. In other preferred embodiments, the present inventionprovides nucleic acid sequences that inhibit the binding of SEQ IDNOs:155-280 and 344-571 under conditions ranging from low to highstringency. Furthermore, as described above in Section IV, the contigsand orthologs can be incorporated into vectors for expression in avariety of hosts, including transgenic plants.

XI. Sequences Conferring Pesticidal Resistance or Tolerance

The present invention also provides nucleic acid sequences that conferresistance or tolerance to pests and insects. It is contemplated thatexpression of these polypeptides in plants reduces the susceptibility ofplants to damage by insects or pests. In preferred embodiments, nucleicacids that confer insect tolerance or resistance to plants are selectedfrom SEQ ID NOs: 3, 150, 151, 26, 31, 36, 58, 78, 94, 106, 107, 110,112, 113, 114, 117, 123. These sequences were identified in the insectscreen described below (see Example 12C). The present invention is notlimited to any particular mechanism of action. Indeed, an understandingof the mechanism of action is not necessary to practice the presentinvention. However, it is believed that expression of the nucleic acidsin plants can lead to insect tolerance or resistance by a variety ofmethods. In some instances, resistance or tolerance is conferred througha secondary effect of expression of the nucleic acid (for example,expression results in the production of metabolic compounds, such assterols, that are toxic to an insect).

In other instances, expression of the nucleic acid sequence may alsoresult in the production of a polypeptide that is directly toxic to aninsect. Such polypeptides can be expressed from any of the six openreading frames in the nucleic acids corresponding to SEQ ID NOs:3, 150,151, 26, 31, 36, 58, 78, 94, 106, 107, 110, 112, 113, 114, 117, 123 andportions thereof. These sequences are also useful for screeningdatabases for orthologs. It is contemplated that these orthologs willalso have pesticidal activity. Insecticidal activity of potions of theinsecticidal polypeptides can be determined by synthesizing the portionsor expressing the portions in plants and exposing insects to plantmaterial comprising the polypeptides. An example of such an assay isprovided in Example 12C.

The polypeptides may be administered as a secretion or cellular proteinoriginally expressed in a heterologous prokaryotic or eukaryotic host.Bacteria are typically the hosts in which proteins are expressed.Eukaryotic hosts could include but are not limited to plants, insects,and yeast. Alternatively, the toxins may be produced in bacteria ortransgenic plants in the field or in the insect by a baculovirus vector.Typically insects will be exposed to toxins by incorporating one or moreof the toxins into the food/diet of the insect.

Complete lethality to feeding insects is preferred, but is not requiredto achieve functional activity. If an insect avoids the toxin or ceasesfeeding, that avoidance will be useful in some applications, even if theeffects are sublethal or lethality is delayed or indirect. For example,if insect resistant transgenic plants are desired, the reluctance ofinsects to feed on the plants is as useful as lethal toxicity to theinsects since the ultimate objective is protection of insect-inducedplant damage rather than insect death.

There are many other ways in which toxins can be incorporated into aninsect's diet.

For example, it is possible to adulterate the larval food source withthe toxic protein by spraying the food with a protein solution, asdisclosed herein. Alternatively, the purified protein could begenetically engineered into an otherwise harmless bacterium, which couldthen be grown in culture, and either applied to the food source orallowed to reside in the soil in an area in which insect eradication wasdesirable. Also, the protein could be genetically engineered directlyinto an insect food source. For instance, the major food source for manyinsect larvae is plant material. Therefore the genes encoding thenucleic acid sequences described above can be transferred to plantmaterial so that the plant material expresses the toxin of interest

It is within the scope of the invention as disclosed herein that thepolypeptides may be truncated and still retain functional activity. By“truncated polypeptide” is meant that a portion of a polypeptide may becleaved and yet still exhibit activity after cleavage. Cleavage can beachieved by proteases inside or outside of the insect gut. Furthermore,effectively cleaved proteins can be produced using molecular biologytechniques wherein the DNA bases encoding the toxin are removed eitherthrough digestion with restriction endonucleases or other techniquesavailable to the skilled artisan. After truncation, the proteins can beexpressed in heterologous systems such as E. coli, baculoviruses,plant-based viral systems, yeast and the like and then placed in insectassays as disclosed herein to determine activity

EXAMPLES Example 1 Construction and Characterization of a NormalizedArabidopsis cDNA Library in GENEWARE Vectors

A. Plant Tissue Generation: Arabidopsis thaliana ecotype Columbia (0)seeds were sown and grown on PEAT LITE MIX (Speedling Inc., Sun City,Fla.) supplemented with NUTRICOTE fertilizer (Plantco Inc., Ontario,Canada). Plants were grown under a 16-hour light/8-hour dark cycle in anenvironmental controlled growth chamber. The temperature was set at 22°C. for daytime and 18° C. for nighttime. The entire plant, root, leavesand all aerial parts were collected 4 weeks post sowing. Tissue waswashed in deionized water and frozen in liquid nitrogen.

B. RNA Extraction: High quality total RNA is isolated using a hot boratemethod. All solutions were made in DEPC-treated, double-deionized waterand autoclaved. All glassware, mortars, pestles, spatulas, and glassrods were baked at 400° C. for four hours. All plasticware wasDEPC-treated for at least three hours and then autoclaved.

Thirty-five ml of XT buffer (0.2 M Na borate decahydrate, 30 mM EGTA, 1%SDS (w/v), 1% deoxycholate, sodium) per 10 grams of tissue was dispensedinto 50 ml Falcon tubes. PVP-40,000 was added to a final concentrationof 2% (w/v). NP-40 was added to a final concentration of 1% (w/v). Tubeswere placed in an 80° C. water bath. The mortar and pestles were thenpre-cooled in liquid nitrogen. Proteinase K (0.5 mg/ml XT buffer) wasdispensed into 250 ml centrifuge bottles and the bottles were thenplaced on ice.

The tissue was added to the pre-chilled mortar and pestle and ground toa fine powder. Working as quickly as possible, the tissue wastransferred to a glass beaker using a spatula chilled in liquidnitrogen. DTT (1.54 mg/nl XT buffer) was added to the XTbuffer/PVP/NP-40 buffer and was immediately added to the ground tissue.The tissue was homogenized using a polytron at level 5 for one minute.The homogenate was decanted into the 250 ml centrifuge bottle containingthe proteinase K. The homogenate was incubated at 42° C., 100 rpm for1.5 hours. Eighty microliters of 2M KCl/ml of XT buffer was added to thehomogenate and gently swirled until mixed. The samples were thenincubated on ice for one hour. The samples were centrifuged at 12,000×Gin a BECKMAN JA-14 rotor (Beckman Instruments, Inc., Fullerton, Calif.)for 20 minutes at 4° C. to remove debris. The supernatant was thenfiltered through a funnel lined with sterile miracloth into a sterile250 ml centrifuge bottle. Eight molar LiCl was added to a finalconcentration of 2M LiCl and the samples were incubated on iceovernight.

Precipitated RNA was pelleted by centrifugation at 12,000×G in a BECKMANJA-14 rotor for 20 minutes (Beckman Instruments, Inc., Fullerton,Calif.) and the supernatant was discarded. The RNA pellet was washed in5 ml of cold 2M LiCl in 30 ml centrifuge tubes. Glass rods and gentlevortexing were used to break and disperse the RNA pellet. The pelletswere centrifuged in a Beckman JA-20 rotor for 10 krpm at 4° C. for 10minutes. The supernatant was decanted. This wash step was repeated 3times until the supernatant was relatively colorless. The RNA pellet wasresuspended in 5 ml of 10 Tris-HCl (pH 7.5). The insoluble material waspelleted in a JA-17 at 10 k rpm for 10 minutes at 4° C. The supernatantwas transferred to another 30 ml centrifuge tube and 0.1× volume of 2MK-acetate (pH 5.5) was added. The samples were incubated on ice for 15minutes and centrifuged in a BECKMAN JA-17 rotor (Beckman Instruments,Inc., Fullerton, Calif.) at 10 k rpm, 4° C., for 10 minutes to removepolysaccharides and insoluble material. The supernatant was transferredto a sterile 30 ml centrifuge tube and RNA was precipitated by adding2.5× volumes of 100% ethanol. The RNA was precipitated overnight at −20°C. The precipitated RNA was pelleted by centrifugation at 9 krpm, 4° C.for 30 minutes in a JA-17 rotor. The RNA pellet was washed with 5 ml ofcold 70% ethanol and centrifuged in a JA-17 rotor at 9 k rpm, 4° C. for10 minutes. The residual ethanol was removed using a BECKMAN speed vac(Beckman Instruments, Inc., Fullerton, Calif.). The RNA pellet wasresuspended in 3 ml of DEPC-ddH 0+1 mM EDTA. The RNA was precipitatedwith 0.1× volumes of 3M Na-acetate pH 6.0 and 2× volumes of cold 100%ethanol. The RNA was put at −80° C. for storage. A BECKMANspectrophotometer (Beckman Instruments, Inc., Fullerton, Calif.) wasused to measure absorbance (A) at A₂₆₀ and A₂₈₀. The A₂₆₀ was used todetermine concentration (40 μg RNA/ml=1 A₂₆₀ absorbance unit) and theA₂₆₀/A₂₈₀ ratio was used to determine the initial quality of the RNA(1.8 to 2.0 is good).

The yield of total RNA from 60 g of tissue is ˜15 mg. Then, mRNA wasisolated from total RNA using oligo (dT)₂₅ DYNABEADS (Dynal, Inc., LakeSuccess, N.Y.). Typically, 1% of total RNA population can be recoveredas mRNA in Arabidopsis thaliana whole plant and from 5 μg of poly A⁺RNA, approximate 4.5 μg of single strand cDNA and 6.7 μg of doublestrand cDNA was synthesized.

C. cDNA Synthesis: Poly A⁺ RNA was purified from total RNA using theoligo (dT) DYNABEADS kit (Dynal, Inc., Lake Success, N.Y.) according tomanufacturers instructions. Briefly, DYNABEADS was resuspended by mixingon a roller and transfer 600 μl to an RNase free tube. The beads werefurther equilibrated with 2× binding buffer (20 mM Tris-HCl, pH 7.5, 1MLiCl, 2 mM EDTA) twice and resuspended in 200 μl of 2× binding buffer.Total RNA 1 mg/200 μl) was heated at 70° C. for 5 minutes and incubatedwith the above oligo (dT) DYNABEADS for 10 min at RT. The supernatantcontaining unbound rRNA and tRNA was subsequently removed by magneticstand and washed twice with 1× wash buffer (10 mM Tris-HCl, pH 7.5,0.15M LiCl, 1 mM EDTA). The mRNA was eluted from the DYNABEADS in ddH₂Oand used as the starting material for double strand cDNA synthesis.

Double strand cDNA was synthesized either with NotI-(dT)₂₅ primer or onoligo (dT) DYNABEADS based on the manufacturers instruction (Gibco-BRLsuperscript system). Typically, 5 μg of poly A⁺ RNA was annealed andreverse transcribed at 37° C. with SUPERSCRIPT II reverse transcriptase(Stratagene, La Jolla, Calif.). For the non-normalized cDNA library,double stranded cDNAs were ligated to a 500 to 1000-fold molar excessSalI adaptor, restriction enzyme NotI digested and size-selected bycolumn fractionation. Those cDNAs were then cloned directionally intothe XhoI-NotI sites of the TMV expression vector, 1057 N/P.

D. Normalization Procedure: For the normalized cDNA preparation, thesupernatant was removed from the DYNABEADS and the cDNA containing beadswere washed twice with 1×TE buffer. To carry out the normalizationprocess, the second strand cDNA were eluted from the beads. One hundredμl of TE buffer was added to the beads and heated at 95° C. for 5 minand the supernatant was then collected on magnetic stand. The aboveprocedure was repeated once to ensure complete elution. The yield ofsecond strand cDNA was quantitated using a UV spectrophotometer.

First strand EDNA beads is combined with second strand cDNA in 4×SSC, 5×Denhardt's and 0.5% SDS for multiple rounds of short hybridization.Since the second strand cDNA was synthesized using the first strand cDNAas the template, approximately the same amount of first and secondstrand cDNAs were present in the hybridization reaction. Nine μg ofsecond strand cDNA in 200 l of 1×TE buffer was added to the cDNA driver(first strand cDNA on beads) in a screw cap tube. The reaction washeated at 95° C. for 5 min, then 60 μl of 20×SSC, 30 μl of 50×Denhardt's (1% of Ficoll, 1% of polyvinylpyrrolidone and 1% of bovineserum albumin) and 15 μl of 10% SDS were added and the reaction wasbrought to 65° C. for 8 hours.

The beads and supernatant were separated at 65° C. by magnet. Thesupernatant was transferred to a fresh tube and kept at 65° C. The beadswere regenerated by adding 200 μl of ddH₂O and heated at 95° C. for 5min. We collected the beads for the next round of hybridization and keptthe solution containing the bound second strand cDNA for furtheranalysis. The partially normalized second strand cDNA solution was addedback to the regenerated beads and a return to another round ofhybridization of 8 hours. This procedure was repeated 4-5 times.

E. Slot blot analysis: To follow the process of cDNA normalization arapid slot blot procedure was developed. Following sequencing of 960cDNAs, 46 cDNAs were selected to follow the representation of variousclasses of cDNAs through the normalization procedure. Based on theirfrequency of appearance in the sequence, these clones representtranscripts of different expression levels (high, moderate and low). Tennanograms of each cDNA were deposited onto a HYBOND-N⁺ membrane(Amersham Pharmacia Biotech, Chicago, Ill.) along with control vector(pBS) and water controls. DNA was denatured, neutralized, andsubsequently crosslinked into the membrane using UV-STRATALINKER 2400(Stratagene, La Jolla, Calif.).

cDNAs from either non-normalized or normalized pool were labeled with³²P and hybridized on the slot blot membrane overnight at 65° C. in 1%bovine serum albumin, 1 mM ethylenediaminetetraacetic acid (EDTA), 0.5 Msodium phosphate (pH 7.2), and 7% sodium dodecyl sulfate (SDS). Then,blots were washed once in 1×SSC/0.2% SDS for 20 min at room temperaturefollowed by two washes in 0.2×SSC/0.2% SDS for 20 min. at 650C. Theresulting membranes were then developed using a PHOSPHORIMAGER (AmershamPharmacia Biotech, Chicago, Ill.) and quantitated using availablesoftware.

F. Conversion of single-stranded normalized cDNAs to double-strandedform: Second strand normalized cDNA in hybridization solution waspurified by QIAQUICK column (QIAGEN GmbH, Hilden, Germany) and eluted in88 μl of ddH₂O (total ˜1.2 μg of DNA is recovered). One μl (3 μg) ofNotI-oligo dT primer was added and heated at 95° C. for 5 min followedby cool down to 37° C. The first strand cDNA was extended with T7 DNApolymerase (Amersham Pharmacia Biotech, Chicago, Ill.) in the presenceof dNTP in 120 μl reaction at 37° C. for 1 hour. T4 DNA polymerase (NEB)was then used to polish the ends following the extension reaction for 5min at 16° C. The resulting double strand cDNA was ethanol precipitatedand ligated with 500- to 1000-fold molar excess of SalI adaptor followedby NotI digestion. The resulting cDNAs were size-fractionated using aClontech spin column 400 and the first two fractions that contained thecDNAs were pooled and used for the subsequent cloning process.

G. Construction of cDNA libraries in GENEWARE vectors: (+) Sense cDNAclones were prepared as follows. The Tobacco Mosaic Virus expressionvector, 1056GTN-AT9 was linearized with NotI and XhoI and a 900 bpstuffer DNA was removed. The presence of the stuffer DNA in betweenthose two sites is to ensure the complete digestion by restrictionenzymes and thus achieve the high cloning efficiency. The digestedvector was gel purified and then used to set up ligation reaction withnormalized cDNA SalI-NotI fragments to generate (+) sense cDNA clones.

(−) Sense cDNA clones were prepared as follows. The Tobacco Mosaic Virusexpression vector 1057 NP also linearized with NotI and XhoI and astuffer DNA fragment was removed. The digested vector was gel purifiedand used to set up ligation reaction to generate (−) sense strandlibrary.

Each ligation was transformed into chemically competent E. coli cells,DH5 (according to manufacturer s instruction (Life Technologies,Rockville, Md.). Preliminary analysis of cloning efficiency was measuredby plating of a small portion of the transformation, while archiving themajority for future applications. Vector-only ligations gave ˜2×10⁴cfu/μg vector and ligations with cDNA insertions gave ˜5×10⁵ cfu/μg.

To support the ability to transfect plants, a TMV based vectoridentified as PBSG1057 was deposited under the Budapest Treaty with theATCC. It is designated ATCC # 203951. A linker sequence 5′-CCCACGCGTCCG-3′ SEQ ID NO: 572 is placed at the 5′ end of each sequencefor insertion into the viral vector.

H. Analysis of Normalized cDNA populations: With each successive roundof kinetic re-association, the total cDNA population is depleted therebyconfirming the removal of a population of the cDNA from the mixture ateach step. To further understand the consequences of this depletion andmeasure the relative normalization in cDNA representation followingvarious stages of the kinetic re-association method, slot blots of 46genes of varying representations were hybridized with probes made fromnon-normalized and normalized cDNA preparations. The resulting blotswere then analyzed for representation by PHOSPHORIMAGER analysis. Thehybridization pattern of non-normalized cDNA to the gene array reveals aquite asymmetric representation with some genes hybridizing with greatintensity while others showing no hybridization at all. The varianceamong hybridization intensities for each spot within the filter wasmeasured by standard deviation and found to be 649. In order to analyzethe cDNA fraction depleted from the mixture, the first strand magneticbead matrix was eluted, a radioactive probe was generated and hybridizedto a replica of the slot blot described above. The hybridizationintensity shows that primarily cDNAs of higher copy number were boundand removed from the normalized cDNA population, confirming that thedepletion phenomenon correlated with removal of primarily high copynumber cDNAs. The cDNA population not bound to first strand magneticbeads after 5 serial passages was collected, radioactive probe wasgenerated and hybridized to a replica slot blot of known gene setdescribed above. The resulting hybridization pattern is in strikingcontrast to that of the non-normalized cDNA and to that of the boundcDNA fraction. Assuming that the majority of the hybridization signal tothe slot blot for the non-normalized cDNA blot results fromhybridization to high abundance genes, an initial comparison can bebetween the number of bound counts on the normalized versusnon-normalized slot blots. This comparison is possible since each probeadded to the blots was derived from the same quantity of cDNA materialand an equal number of probe counts were applied to the blots. Thenon-normalized blot contained 17,898 counts while the normalized blotcontained only 1494 counts. This represents a 12-fold reduction inoverall signal indicating a significant reduction in high gene copynumber in the normalized cDNA population.

When the hybridization intensity of the non-normalized cDNA probe toeach gene is plotted against the relative number of counts (followingsubtraction of the pBS vector control intensity from each sample), thereis almost a 4-log difference in sequence representation in the cDNApopulation and an overall variance in standard deviation of 649-fold. Incontrast, the hybridization of normalized cDNA probe to each generevealed a average 32-fold difference. This represents both a reductionin high copy cDNAs and an increased representation in low copy cDNAsby >3 logs. The variance between the most highly represented cDNA andlowest represented cDNA within the normalized cDNA population was ˜1.5logs. The above values characterizing the degree of librarynormalization are equivalent to those achieved by Soares, et al. (1994).

I. Analysis of GENEWARE clones: To ascertain the cloning efficiency ofnormalized cDNA into each vector and the average insert size, 96 randomcolonies were picked and grown by standard methods. DNA was isolatedfrom bacteria using a BIOROBOT 9600 (QIAGEN GmbH, Hilden, Germany). DNAwas digested with Not I and BsiWI restriction endonucleases (recognitionsites flank the cDNA insertion). The digestions were separated onagarose gels and visualized by ethidium bromide staining. The digestionsrevealed a vector religation background of 4%. Ligations giving >75%insertions were passed as to quality control and more colonies werepicked. Approximately 600 independent clones were analyzed byrestriction digestion as described above. Interestingly, a similarpercentage of vector background was detected 4% and the average insertsize in the vector was ˜1 kb, with many inserts with 2 kb or greatersized inserts. Following analysis of DNA by restriction mapping, DNA wassubjected to sequencing and further analysis.

J. Sequence Analysis of the Normalized Arabidopsis Library in GENEWAREInitial analysis of non-normalized Arabidopsis cDNA library required thesequencing of 1709 independent clones. Three 96-well plates (SeqID #56601-56896) of randomly picked normalized Arabidopsis library inGENEWARE [(−) sense] were initially sequenced by primer TP6 to yield 2625′ sequences and passed sequence quality control. Initially, internalcluster analysis was performed to identify identical sequences in thissequence subset. Analysis using BLASTN algorithm showed that of the 262sequences analyzed, 252 were unique and only 10 were found to clusterinto five two-member clusters. We then identified the redundancy of thesequences against the larger public databases. For cluster analysis, weused a very low BLASTX score criteria (e=10⁻⁶) and compared allsequences against the GENBANK nr database United States Department ofHealth and Human Services). In this manner, we could derive the mostinformation concerning the redundancy, gene type found and open readingframe status of all clones simultaneously. The low BLASTX score was usedto allow all possible protein homologues to be identified. Theclustering analysis revealed that of the 262 sequences there were 252single member sequence clusters and five two-gene clusters. Thisrepresents 96% singletons from this sample size. The genes appearingmore than once in the library varied from two different chlorophyll a/bbinding proteins, lipid transport proteins to ferrodoxin-thioredoxinreductases. This result compares quite favorably to the 4 redundantclones (of one gene type) identified by Soares, et al. (1994) from 187randomly picked clones from one normalized library.

Further analysis of the sequences from the GENEWARE normalized cDNAlibrary revealed that of the 262 sequences subjected to BLASTX search ofthe GENBANK nr database, 29% of the sequences failed to show significanthomology to any characterized protein or open reading frame (ORF). Ofthe 252 singletons in the library, 179 showed single hit to anidentified ORF, while 73 showed no hit. These results suggest that, inspite of the well-characterized nature of the sequence database qualitylibraries can still contain a high proportion of new expressedsequences.

The excellent representation and extremely low redundancy observed inthese initial plates of normalized Arabidopsis cDNAs in GENEWAREprompted us to sequence additional clones. This was important becausethere is often a significant bias in small sample sizes with regard torepresentation. A total of 1,151 sequences passed sequence qualitycontrol. Internal cluster analysis showed that ˜260 multi-sequenceclusters were present, with the highest representation at 6 members andthe majority with only 2 members (˜150). About 600 unique clusters wereidentified from the total of 856 clusters from the 1151 sequences.Therefore, from the 1151 sequences analyzed, 1,010 unique genes wereidentified, or a 87.7% gene discovery rate. In contrast, internalcluster analysis of the non-normalized Arabidopsis cDNA sequencesrevealed ˜840 multi-gene clusters with the highest represented clustercontaining 27 members. Cluster analysis of the 1709 non-normalizedArabidopsis cDNAs revealed clusters of 27 members and many other highlypopulated clusters. The dramatic difference in the normalized populationis clearly observed by plotting cluster number versus number of members.

Further comparison of 1,151 randomly chosen non-normalized sequences forredundancy with the results from the 1,151 normalized population clearlyshows the positive effects of normalization and the greater number ofunique genes identified from this normalized population. The reductionin the representation of individual genes in the normalized librarycompared with the non-normalized population can be observed. Clearly,many genes that have representations of >12 in the non-normalizedlibrary have been reduced to 1-4 members in the normalized population.One chlorophyll a/b binding protein gene shows a reduction from 15members in the non-normalized population to 1 in the normalized library,whereas a gene encoding a distinct chlorophyll a/b binding protein showsless reduction (7 as compared with 5) in the normalized gene population.This is consistent with the observation that certain genes did notundergo the same degree of normalization compared with other genes.

Additional sequences from the normalized Arabidopsis library wereobtained by sequence analysis. BLASTN analysis of the 1,343 normalizedsequences revealed that 858 were represented in the Arabidopsis ESTdatabase, while the remaining 485 sequences were apparently unique, withno obvious homologue in the database. Of those sequences showing BLASTNhits, 43.6% showed coverage of the first through tenth base in thelongest EST in the database. Furthermore, 242 of the 858 (28%) showed 5sequences that were at the first base of the longest EST or longer.These data show that the cDNAs cloned into GENEWARE are of significantquality and represent, in many cases, the longest 5 sequences obtainedto date. To further ascertain the proportion of cDNAs containingfull-length protein open reading frames, we employed the ORF finderprogram used to analyze the ABRC library for sense clones. Thisalgorithm checks for ATG sequences in the first 70 bases of a sequenceand then scans for sequences lacking an in-frame stop codon for at least300 nt downstream in the same frame. To understand the number of qualityopen reading frames (ORFs) in a library, we used the ABRC library as abenchmark. Analysis of 11,957 sequences within the ABRC library with theORF finder program revealed 3,207 hits (26.8%) with putative openreading frames. From the 1,343 sequences of the normalized ArabidopsiscDNA library in GENEWARE, 907 (67.5%) were hits using the ORF finderprogram. Coupling the number of cDNAs that represent near the 5′ end ofthe known RNA sequence (43.6%) with the number of clones that containputative intact OREs (67.5%) testifies to the quality and integrity ofthe cDNAs in the GENEWARE vector. These data clearly indicate a highproportion of full-length clones.

K. Quantity of Normalized Arabidopsis cDNAs Cloned into GENEWAREVectors: As previously described, the normalized Arabidopsis cDNApopulation was cloned into GENEWARE(vectors in both the positive (+) andnegative (−) sense direction to allow for both over expression and geneknockout analysis. The total number of clones in the 1057 PN vector innegative orientation was 20,160. These were arrayed into 210 96-wellglycerol stock plates. Likewise, 20,160 clones from the ligation ofnormalized Arabidopsis cDNA in sense orientation into 1056 GTN vectorhave been arrayed in 210 96-well glycerol stock plates. These numbersclearly show that the GENEWARE vectors can be used as primary cloningvectors and that very complex libraries can be obtained in twoorientations from a single pool on non-amplified normalized cDNA.

Example 2 Construction of Tissue-specific N. benthamiana cDNA Libraries

A. mRNA Isolation: Leaf, root, flower, meristem, and pathogen-challengedleaf cDNA libraries were constructed. Total RNA samples from 10-5 μg ofthe above tissues were isolated by TRIZOL reagent (Life Technologies,Rockville, Md.). The typical yield of total RNA was 1 mg PolyA⁺RNA waspurified from total RNA by DYNABEADS oligo (T)₂₅. Purified mRNA wasquantified by UV absorbance at OD₂₆₀ The typical yield of mRNA was 2% oftotal RNA. The purity was also determined by the ratio of OD₂₆₀/OD₂₅₀.The integrity of the samples had OD values of 1.8-2.0.

B. cDNA Synthesis: cDNA was synthesized from mRNA using the SUPERSCRIPTplasmid system (Life Technologies, Rockville, Md.) with cloning sites ofNotI at the 3′ end and SalI at the 5′ end. After fractionation through agel column to eliminate adapter fragments and short sequences, cDNA wascloned into both GENEWARE vector p1057 NP and phagemid vector PSPORT inthe multiple cloning region between Not1 and Xho1 sites. Over 20,000recombinants were obtained for all of the tissue-specific libraries.

C. Library Analysis: The quality of the libraries was evaluated bychecking the insert size and percentage from representative 24 clones.Overall, the average insert size was above 1 kb, and the recombinantpercentage was >95%.

Example 3

Construction of Normalized N. benthamiana cDNA Library in GENEWAREVectors

A. cDNA synthesis. A pooled RNA source from the tissues described abovewas used to construct a normalized cDNA library. Total RNA samples werepooled in equal amounts first, then polyA⁺RNA was isolated by DYNABEADSoligo (dT)₂₅. The first strand cDNA was synthesized by the Smart IIIsystem (Clontech, Palo Alto, Calif.). During the synthesis, adaptersequences with Sfi1a and Sfi1b sites were introduced by the polyApriming at the 3′ end, and 5′ end by the template switch mechanism(Clontech, Palo Alto, Calif.). Eight μg first strand cDNA wassynthesized from 24 μg mRNA. The yield and size were determined by UVabsorbance and agarose gel electrophoresis.

B. Construction of Genomic DNA driver. Genomic DNA driver wasconstructed by immobilizing biotinylated DNA fragments ontostreptavidin-coated magnetic beads. Fifty μg genomic DNA was digested byEcoR1 and BamH1 followed by fill-in reaction using biotin-21-dUTP. Thebiotinylated fragments were denatured by boiling and immobilized ontoDYNABEADS by the conjugation of streptavidin and biotin.

C. Normalization Procedure. Six μg of the first strand cDNA washybridized to 1 μg of genomic DNA driver in 100 μl of hybridizationbuffer (6×SSC, 0.1% SDS, 1× Denhardt's buffer) for 48 hours at 65° C.with constant rotation. After hybridization, the cDNA bound on genomicDNA beads was washed 3 times by 20 μl 1×SSC/0.1% SDS at 65° C. for 15min and one time by 0.1×SSC at room temperature. The cDNA bound to thebeads was then eluted in 10 μl of fresh-made 0.1N NaOH from the beadsand purified by using a QIAGEN DNA purification column (QIAGEN GmbH,Hilden, Germany), which yielded 10 ng of normalized cDNA fragments. Thenormalized first strand cDNA was converted to double strand cDNA in 4cycles of PCR with Smart primers annealed to the 3′ and 5′ end adaptersequences.

D. Evaluation of normalization efficiency. Ninety-six non-redundant cDNAclones selected from a randomly sequenced pool of 500 clones of apreviously constructed whole seedling library were used to construct anylon array. One hundred ng of the normalized cDNA fragments vs. thenon-normalized fragments were radioactively labeled by ³²P andhybridized to DNA array nylon filters. The hybridization images andintensity data were acquired by a PHOSPHORIMAGER (Amersham PharmaciaBiotech, Chicago, Ill.). Since the 96 clones on the nylon arraysrepresent different abundance classes of genes, the variance ofhybridization intensity among these genes on the filter were measured bystandard deviation before and after normalization. Our result indicatedthat by using this type of normalization approach, we could achieve a1000-fold reduction in variance among this set of genes.

E. Cloning of normalized cDNA into GENEWARE vector. The normalized cDNAfragments were digested by Sfi1 endonuclease, which recognizes 8-bpsites with variable sequences in the middle 4 nucleotides. After sizefractionation, the cDNA was ligated into GENEWARE vector p1057 NP inantisense orientation and transformed into DH5α cells. Over 50,000recombinants were obtained for this normalized library. The percentageof insert and size were evaluated by Sfi digestion of randomly picked 96clones followed by electrophoresis on 1% of agarose gel. The averageinsert size was 1.5 kb, and the percentage of insert was 98% with vectoronly insertions of >2%.

F. Sequence analysis of normalized cDNA library. Two plates of 96randomly picked clones have been sequenced from the 5′ end of cDNAinserts. One hundred ninety-two quality sequences were obtained aftertrimming of vector sequences and other standard quality checking andfiltering procedure, and subjected to BLASTX search in DNA and proteindatabases. Over 40% of these sequences had no hit in the databases.Clustering analysis was conducted based on accession numbers of BLASTXmatches among the 112 sequences that had hits in the databases. Onlythree genes (tumor-related protein, citrin, and rubit) appeared twice.All other members in this group appeared only once. This was a strongindication that this library is well-normalized. Sequence analysis alsorevealed that 68% of these 192 sequences had putative open readingframes using the ORF finder program (as described above), indicatingpossible full-length cDNA.

Example 4

DNA Preparation

A. High Throughput Clone Preparation: Arraying of the ABRC library intoGENEWARE vectors was conducted to obtain ˜5,000 antisense and ˜3,000sense clones with minimal redundancy. The ligations were between highlypurified and quality controlled GENEWARE cloning vector plasmids and thecorresponding fragments from each individual pool of ABRC clones.Cloning efficiencies were in the range of 1×10⁵ to 5×10⁵ per μg ofplasmid. Colonies were picked using a Flexys Colony Picker (The SangerCentre, England) and manual methods. Colonies were applied to deep-wellcell growth blocks (DWBs) and grown from 18-26 hours at 37° C. at ˜500rpm in the presence of ampicillin concentrations of 500 μg/ml. From thealmost 9,000 colonies picked by the Flexys, >97% of the culturessuccessfully grew. DNA was prepared using the QIAGEN BIOROBOT 9600 DNArobots and QIAGEN 96-well manifolds (manual preparation) at a rate of˜2,000 DNA preparations per day. The final throughput, during campaignproduction, estimated for each system was ˜20 plates of 96 samples perday, per production line—robotic or manual. Such throughput could besustained to generate 20-40,000 samples in a matter of one to two weeksof effort. During one ten day period, one hundred four (140) 96-wellplates of DNA were produced.

B. Quality Control Methods: DNA samples were subjected to qualitycontrol (QC) analysis by at least one of two methods: 1) restrictionendonuclease digestion and analysis by agarose gel electrophoresis (allplates) or 2) UV spectroscopy to determine DNA quantitation for all 96samples of a plate (statistical sampling of each days output). For UVanalysis, an aliquot of the DNA samples from each plate was taken andmeasured using a Molecular Dynamics UV spectrometer in 96-well format(Molecular Dynamics, Sunnyvale, Calif.). DNA concentrations of 0.05-0.2(μg/μl with OD 260/280 ratios of 1.7±0.2 are expected. For DNAsequencing purposes (a downstream method to be used to analyze all hitsamples), DNA quantity of ˜0.04-0.2 μg/μl is desired. In general, platesthat contain >25% of samples not conforming to this metric are rejectedand new DNA for the plate must be generated once again. For conformationof the presence of insertions and full-length GENEWARE vector, agarosegel electrophoresis of restriction endonuclease fragments was used.Aliquots of sixteen samples from each 96-well DNA plate were targetedfor restriction digestion using Nco I and BstE II restrictionendonucleases. Samples were separated on 1% agarose gels. Generally,plates that showed >25% of samples that were not full length or did notcontain insertions were rejected. From a total of 140 96-well DNA platesprepared, 112 passed QC and were made available for generation ofinfectious units.

Example 5 High-Throughput DNA Sequencing and Sequence Analysis Protocols

A. Generation of Raw Sequence Data and Filtering Protocols:High-throughput sequencing was carried out using the PCT200 and TETRADPCR machines (MJ Research, Watertown, Mass.) in 96-well plate format incombination with two ABI 377 automated DNA sequencers (PE Corporation,Norwalk, Conn.). The throughput at present is six 96-well plates perday.

The quality of sequence data is improved by filtering the raw sequenceoutput from sequencer. One criteria is to make sure that the unreadablebases are less than 10% of the total number of bases for any sequenceand that there are no more than ten consecutive Ns in the middle part ofthe sequence (40-450). The sequences that pass these tests are definedas being of high quality.

The second step for improving the quality of a sequence is to remove thevectors from the sequence. There are two advantages of this process.First, when locating the vector sequence, its position can be used toalign to the input sequence. The quality of the sequence can beevaluated by the alignment between the vector sequence and the targetsequence. Second, the removal of the vector sequence greatly improvesthe signal-to-noise ratio and makes the analysis of the resultingdatabase search much easier. A third important prefiltering step is toeliminate the duplicates in a library so it will speed up the analysisand reduce redundant analyses.

B. Sequence Data Analysis and Bioinformatics: Once the filtering and thevector sequence removal steps are completed, the resulting sequences aresubjected to database search. First, low sensitivity methods such asBLASTN and BLASTX can be used. For those sequences that have no hit,more sensitive methods, such as Blimps and Pfam can be used. To speed upthe analysis process, appropriate filters may be used. For example, forEST sequences from a given cDNA library sequenced from the 5′ end, anATG filter can be used to make sure that only full-length cDNA will beanalyzed. The filtered sequence can be translated in one frame ratherthan six frames for Pfam analysis.

The results from the database search are stored in the relationaldatabase and can be used for further analysis. For example, all theBLAST results can be stored in a relational table that contains Query,Score, pValue, Hit, Length, Annotation, Frame, Identity, Homology, QueryLength, Subject Length, Database Queried and Method used to analyze. Anyresult can be queried and analyzed by the fields mentioned. A databaselink between the analysis result database and the laboratory informationmanagement system (LIMS) will be created so that the analysis result canbe related to the experimental data.

C. Metabolic Pathway Analysis: Many metabolic pathway databases havebeen constructed that group proteins based on their roles in a metabolicpathway. The basic identifiers for these proteins are E.C. numbers;therefore, the position of a given enzyme in a metabolic pathway may bedetermined based on its E.C. number. By querying the GENETHESAURUSdatabase (DoubleTwist, Inc., Oakland, Calif.), the E.C. number of aprotein can be obtained by its GenBank ID. This approach can be used toassign the corresponding E.C. number to the hits found for each cDNAsequence. By querying the metabolic pathway using the E.C. number of ahit, a potential link between this cDNA sequence and the metabolicpathway may be established. Each link can be used as a building blockfor a plant metabolic pathway. This potential link between cDNA sequenceand metabolic pathway provides a starting point to analyze the gene srole in a metabolic pathway.

D. Sequence Analysis of Library Created from GENEWARE Vectors: Fivehundred sixty-eight (568) independent clones were sequenced from thevirus expression library and the clones from this library were analyzedby vector, N filters and BLAST analysis. Of the 568 initial sequencessubmitted for analysis, 131 were eliminated by the N-filter indicatingthat ˜15% of the sequence were undetermined Ns. The remaining 437sequences were then subjected to analysis for duplication within eachset of submitted plates. Fifty-five (55) sequences were removed due tothis duplication filter. These sequences were BLASTN searched against539 sequences from the AtwpLNLH library in Lambda Zap II. Thirty percent(30%) of the sequences (132 sequences) found a match in both libraries.From the original set of GENEWARE clones, 305 were found to be uniquewith respect to the Lambda Zap II library. These sequences were thenBLASTX-searched against non-redundant GENBANK. From the 305 submittedsequences, 173 sequences found solid hits in protein coding sequence asdetermined by hit criteria and 132 were found to be unique. FurtherBLASTN analysis showed a range of sequence homology, but manyrepresented hits to BAC or chromosomal sequences. A wide range ofsequences were found including, ribosomal proteins, photosystem reactioncenter proteins, fumarase and other general metabolism proteins,transcription factors, kinase homologs, omega-6 fatty acid desaturaseand various hypothetical proteins. These results strongly suggest thatlittle or no bias is introduced during the construction of cDNAlibraries in GENEWARE.

Example 6 Preparation of Infectious Units

DNA plates that pass QC testing were then moved to the next stage of thecycle, the generation of infectious units. In vitro RNA transcriptionshave been optimized to produce maximal amounts of RNA in smaller volumesto reduce costs and increase the lifetime of a DNA preparation. Atranscription mixture containing a 6-to-1 RNA cap structure-to-rGTPratio, Ambion mMessage Machine buffer and enzyme mix (Ambion, Inc.,Austin, Tex.) is delivered to a 96-well plate by the TECAN liquidhandling robot (TECAN, Research Triangle Park, N.C.). To this reactionmix, the Robbins Scientific HYDRA 96-sample pipeting robot (RobbinsScientific, Sunnyvale, Calif.) delivers 2 μl of DNA solution. This finaltranscription reaction is incubated at 37° C. for 1.5 hours. Followingincubation, the TECAN robot delivers 95 μl of a 100 mM Na/K PO₄ buffercontaining TMV coat protein (devoid of all infectious RNA) to thetranscription plate and it is incubated overnight. This incubationgenerates encapsidated transcripts, which are very stable at roomtemperature or 4° C. and amplified with regard to number of infectiousunits per (g of RNA transcript. The generation of infectious materialsis measured by inoculation of GFP-expressing virus to systemic host orNicotiana tabacum NN lines, incubation at permissive temperatures andcounting of developing local lesions on inoculated leaves. Beforeaddition of the TMV coat protein mixture, 0.5 μl from 8 wells of eachtranscription plate is removed and analyzed by agarose gelelectrophoresis. The presence of an RNA band of ˜1.6 to 3.5 kb is strongevidence for a successful transcription. If >25% contain only lowermolecular weight RNA bands, or if the band is diffuse <500 bp of dsDNAmarker, the transcription plate is considered to have failed and removedfrom the stream of plates prepared for inoculation. During a two-weekperiod, 112 plates were transcribed and 108 plates were passed for plantinoculation in growth rooms and in the field.

Example 7 Plant Inoculation with Encapsidated RNA Transcripts

In order to prepare for plant inoculation, 90 μl of each encapsidatedRNA transcript sample and 90 μl of FES transcript inoculation buffer(0.1M glycine, 0.06 M K₂HPO₄, 1% sodium pyrophosphate, 1% diatomaceousearth and 1% silicon carbide) were combined in the wells of a new96-well plate. The 96 well plate was then placed on ice. Nicotianabenthamiana plants 14 days post sowing were removed from the greenhouseand brought into the laboratory. Humidity domes were placed over theplants to retain moisture. The RNA transcript sample was mixed bypipetting the solution prior to application to ensure that the siliconcarbide and the diatomaceous earth were resuspended. The entire sample,180 μl, was drawn up and pipetted in equal aliquots (approximately 30μl), onto the first two true leaves of three separate Nicotianabenthamiana plants. The mixture was spread across the leaf surface usinga TEXWIPE CLEANFOAM swab (The Texwipe Co, Upper Saddle River, N.J.). Thewiping action caused by the swab together with the silicon carbide inthe buffer sufficiently abrades the leaves so as to allow theencapsidated RNA transcript to enter the plant cell structure. Othermethods used for inoculation have included pipetting ofencapsidation-FES mixture onto leaves and rubbing by hand, cotton swabor nylon inoculation wand. Alternatively, nylon inoculation wands may beincubated in the transcript-FES mixture for ˜30 min to soak up ˜15 l andthen rubbed directly onto the leaves.

Once an entire 32 plant flat was inoculated, the plants were misted withdeionized water and the humidity domes were replaced over them. Theinoculated plants were retained in the laboratory for 6 hours and thenreturned to the greenhouse. Once in the greenhouse, the humidity domeswere removed and the plants were misted a second time with deionizedwater.

Example 8 Inoculated Plant Growth

Plants inoculated with encapsidated virus were grown in a greenhouse.Day length was set to 16 hours and shade curtains (33% transmittance)were used to reduce solar intensity. Whenever ambient light fell below250 μmol m²s⁻¹, a 50:50 mixture of metal halide and sodium halide lamps(Sylvania), delivering an irradiance of approximately 250 μmol m²s⁻¹,were used to provide supplemental lighting. Evaporative cooling andsteam heat were used to regulate temperature, with a daytime set pointof 27° C. and a nighttime set point of 22° C. The plants were irrigatedwith Hogland's fertilizer mix as required. Drainage water was collectedand treated with 0.5% sodium hypochlorite for 10 minutes beforedischarging into the municipal sewer.

To allow space for increased plant size, the inoculated N. benthamianawere repositioned at seven days post-inoculation (dpi) so that theyoccupied twice their original area. At 13 dpi, the plants were examinedvisually for symptoms of TMV infection and were assigned a numericalscore to indicate the extent of viral infection (0=no infection,1=possible infection, 2=limited/late infection, 3=typical infection,4=severe infection). At the same time, the plants were assigned a fatefor harvest (typically the highest quality plant in each triplicate wasassigned to metabolic screens and the second highest quality plant wasassigned to focused screens). In cases where plant symptoms deviatedsubstantially from those of plants inoculated with control vectors, adescription of plant phenotype was recorded (see below). At 14 dpiinfected plants were harvested.

Example 9 Infectivity Analysis

The method to measure the infectivity of the transcript encapsidationswas to inoculate a set of 96-well plates from both positive and negativesense clones and look for systemic virus movement and phenotypedevelopment. Of the 8,352 plants inoculated with unique encapsidatedtranscriptions, 6,266 became systemically infected for an infection rateof 76%. Overall, the majority of plates generated showed very goodinfection rates. As shown in a graph of the number of systemicallyinfectious constructs per each individual plate plotted against platenumber. The majority of plates had systemic rates >70% with one at 100%.Approximately 25 plates had infection rates ranging between 40 and 70%while only 6% (>5 plates) showed infection rates <45%.

A population of constructs did not show systemic infection on Nicotianabenthamiana plants. Analysis using the LIMS revealed a substantialcorrelation between a subset of inoculators and the transcription platesshowing poor infection rates. These results strongly suggest thatinoculation technique is critical for good infectivity although otherpossible causes could include poor DNA or transcription quality, orsimply inoculation error. In some cases the constructs may be restrictedto inoculated leaves by way of adverse influence of the gene insertionon virus replication and movement. For example, one observed healthyinoculated Nicotiana benthamiana plant exhibited clear chlorotic spotson inoculated leaves, yet no systemic symptoms. Other plants, not scoredas infected in our LIMS, were observed to have subliminal infections insource tissues. It was clear that the properties of the geneticinsertion had differing effects on virus phenotypic symptoms. Eighty-twoof those constructs exhibiting poor systemic infection werere-inoculated into Nicotiana tabacum NN plants to test for locallesions. The presence of local lesions indicated infectious viralvectors. From this data, a statistical calculation can be made todetermine the percentage of non-systemic infective constructs that arelocally infectious. Plants were scored 6 days post-inoculation for thepresence of localized necrotic lesions resulting from infection andlocalized movement of virus vectors on the inoculated leaves of theplants. Of the 82 constructs analyzed, 50 showed local lesionsindicating the presence of infectious viral vectors. Based on theinfection rate observed in Nicotiana benthamiana and NN tobacco plants,we estimate that 1,181 (˜61%) of the constructs not showing systemicinfection on Nicotiana benthamiana plants were still infectious andamenable to biochemical analysis.

Example 10 Phenotypic Evaluation

At 13, dpi a visual examination was made to identify plants whosephenotype deviates substantially from plants infected with a GENEWAREcontrol. The phenotypically different plants were divided into regions(for example: shoot apical region, infected phloem source leaves, stem)and descriptive terms were applied to each region to document the visualobservation. Additionally, a confirmation was made as to whether or notthe operator considered the plant to be a hit and a numerical score wasapplied to document the phytotoxic/herbicide effect of the RNA insert(1=possible effect, 2=mild, 3=moderate, 4=severe).

A matrix-style phenotypic database was created using the LIMS software.The LIMS software allows all descriptive terms to be used for any majorpart of the plant and the capacity of sub-parts to be described. Notablephenotypic events are captured by description of individual plant parts.The matrix is configured in a Web-based page that allows one to scoreinfection and phenotyping using a graphic replicated of the physicalarrangement of plants in the growth room. This approach is rapid,allowing 96 plants to be described in detail as being infected, notinfected with a detailed phenotype in 15 min. Editing of output filescan occur rapidly in MS Excel if desired. The output file is then loadedas CSV files into the LIMS where it is immediately available to Booleanquery as to phenotype descriptors with “and, or, not” statements. Imagesof infected plants are linked to the SeqIDs in the database so that theplant tray bar code (for infection), well position, SeqID, phenotype andpicture all link together when a query is made. This is linked back tothe sequence database for sequence annotation data. Using this system,8,352 phenotypic observations were made in the period of two days andentered into the LIMS. Hundreds of interesting visual phenotypes wereobserved.

One measured phenotype was stunting of plant growth. Plants infectedwith viral vectors comprising SEQ ID NOs: 1-154 exhibited a stuntingphenotype in the initial assay.

Example 11 Field-Scale Genomics

The effects of gene over expression and gene silencing in plants mayhave dramatic differences when grown under different conditions. TheKentucky field test plots available to Large Scale Biology, Inc.provides an opportunity to subject plants to substantially differentgrowth conditions and thereby broaden the chances of detecting varioustypes of hits in a genomics screen. To compare the ability of virusvectors to be applied under field conditions and under controlled growthroom conditions, we inoculated, in duplicate, 960 positive-senseconstructs on Nicotiana benthamiana plants grown in the field test plotin Owensboro, Kentucky. This activity was concurrent with inoculationsand screens performed in Vacaville, Calif. Complete encapsidatedtranscription reactions were prepared at Biosource Genomics inVacaville, Calif. and following incubation with TMV coat protein, FESbuffer was added to each well. All samples in column 12 of each platecontained encapsidated transcripts of 1057 vector containing the GFPgene. The mixture was then overnight-mailed to Owensboro, Kentucky whereit was inoculated onto 4-5 week post-sowing plants by rubbing cottonswabs, pre-wetted by incubation with encapsidated transcript-FESmixture, on plant leaves. Plants were inoculated in duplicate. Plantswere allowed to remain in the field for 4 weeks post-inoculation andthen subjected to phenotypic analysis. Photographic documentation of theplants both pre- and post-inoculation was prepared. Plants were scoredby visual evaluation as to number of infected plants compared with totalnumber of plants inoculated. Of the 1920 plants inoculated, 1,712 (88%)showed systemic infections. More than 100 new phenotypes were noted inthe field. Each was compared with the phenotype of the same constructinoculated into plants in Vacaville, Calif. growth rooms. Two newphenotypes are particularly noteworthy: two independent plants showedsurvival phenotypes under anaerobic conditions, whereas all neighborshad succumbed to root rot in a low spot in the field.

In order to evaluate the effect of gene silencing in Nicotiana tabacumplants, mRNA from Arabidopsis thaliana whole plants was subjected tofragment normalization such that small cDNA fragments were produced. ThecDNA population showed high degree of normalization by hybridizationswith known genes of variable expression and by comparison withnon-normalized cDNA fragments. The average size of the normalizedfragments in the GENEWARE vectors was between 400-500 bp allowing facilemovement of the recombinant viruses systemically in field Nicotianatabacum c.v. MD609 plants. A total of 11 plates of DNA constructs (1056)were prepared, transcribed and encapsidated with GFP constructsintegrated at every 12th position. These were mixed with FES andovernight-mailed to Owensboro, Kentucky. These 1056 constructs wereinoculated in duplicate (2112 total) on MD609 tobacco plants 11 weekspost-sowing. One set of the replicates (1056 plants) were scored byvisual evaluation as to number of infected plants compared with totalnumber of plants inoculated. Of the 1056 plants inoculated, 808 showedsystemic infections, or 76.5% infection rate. Hits were determined byunusual visual symptoms and corresponding constructs will becharacterized by DNA sequencing.

An uncharacterized GENEWARE library comprised of ˜20,000 Arabidopsisthaliana normalized fragment cDNAs and ˜10,000 of Nicotiana benthamianagenomic DNA fragments was prepared and sprayed as a population onNicotiana tabacum c.v. MD609 plants. The Arabidopsis cDNA library,10,000, was constructed by ligation into prepared GENEWARE vectors andpurified from pooled bacterial transformants and followed by pooledtranscription. The remaining 10,000 cDNA fragments were individualclones prepared and transcribed independently and then mixed in a pooledencapsidation. The Nicotiana library was a prototype cell-free cloninglibrary from restriction endonuclease fragmented gDNA of <500 bp insize. The number of clones corresponds to an approximation of the amountof DNA undergoing complete ligation. Transcriptions from eachnon-encapsidated library were inoculated separately into Nicotianatabacum protoplasts and allowed to incubate for three days. Cells werelysed and libraries combined. The pool of cell lysates and encapsidatedtranscriptions containing viral libraries were shipped to Owensboro,Kentucky where they were inoculated onto Nicotiana tabacum c.v. MD609plants at 1, 1/10, 1/100 and 1/000 dilution of the mixed virionpreparation (using 60 ml, 6 ml, 0.6 ml and 0.06 ml of the libraryrespectively). Eight hundred (800) plants were spray-inoculated witheach library virion dilution. Plants were visually scored and of the3,200 plants inoculated, 1,304 showed visual symptoms 3 weekspost-infection. The infectivity rate varied from ˜60% for the mostconcentrated inoculum to ˜20% for the most dilute as would be expecteddue to dilution. Analysis will continue to define Hits by unusual visualsymptoms and PCR amplification and DNA sequencing will characterizecorresponding construct.

Example 12 Metabolic Screens

A. Harvest and Preparation of Tissues for Metabolic Screening. Fourteendpi infected plants to be harvested were moved from the greenhouse tothe laboratory. Plants were scanned and identified by a bar-code thatlinked the infected plant to the tissue sample. The infected tissue wascut off of the plant and placed in a corresponding centrifuge tube. Atungsten carbide ball was placed on top of the infected tissue sample.The tungsten carbide ball facilitates pulverization of plant tissue. Thetubes and sample were stored on dry ice during the harvesting procedure.The samples were then stored at 700C. Before conducting a metabolicscreen, the tissue samples must be pulverized. The sample tubes wereloaded into a KLECO pulverizer and pulverized to create a fine powder ofthe tissue sample. The tissue sample powder was then weighed out into ametabolic extraction vial.

B. FAME Analysis Procedure for FAME Screen. Nicotiana benthamiana plantsexpressing genes of interest in RNA vectors were grown for 14 dpi asdescribed above. Three leaf disks (0.5 cm in diameter) were placed incell wells of a borosilicate 96-deepwell plate (Zinsser). 500 μl ofheptane was added to each well using a Biomek 2000 Laboratory AutomationWorkstation. The heptane/tissue samples were stirred on a Bodinemagnetic stirrer. After 30 minutes, 50 μl of 0.5N sodium methoxide inmethanol was added to each well using the Biomek 2000. After 30 minutesof stirring, 10 μl of water was added to each well. Injections were madedirectly from the 96-deepwell plate into a Hewlett Packard gaschromatograph (GC) using a LEAP auto injector. The GC method involved a2 μl injection into a split/splitless injection port using a DB 23narrow bore column (15 M, 0.25 I.D.). The oven temperature wasisothermic at 1700C. The injector temperature was 230° C. and thedetector (flame ionization) temperature was 240° C. The run time was 5minutes, with an equilibration time of 0.5 minutes. The split ratio was20:1 and the helium flow rate was held at a constant pressure of 19 psi.This GC method allowed for separation and quantification of fatty acidmethyl esters which included C16:0, C16: 1, Cl 8:0, Cl 8: 1, C18:2, andC18:3. Using a dual column GC, four 96-well plates could be sampled inless than 24 hours.

The following sequences exhibited a positive FAME result (had alteredlevels of the fatty acids assayed): SEQ ID NOs: 151, 52, and 94. Theresult of the FAME analysis for SEQ ID NO:94 is shown in Table 4. Table7 shows the relative percent amounts of fatty acids found in plantstransfected with a viral vector comprising SEQ ID NO: 94. An increase in16:0 fatty acids was observed in 3 of the 5 samples assayed. Table 8shows the relative percent amounts of fatty acids found in plantstransfected with SEQ ID NOs: 52 and 151.

TABLE 7 FAME Profile Sample 16:0 16:1 unk 16:3 unk 18:0 18:1 18:2 18:3unk 1 24.7 3.4 1.1 3.2 2.6 2.6 3.3 9.2 47.8 2.0 2 20.1 2.9 0.8 4.6 2.93.5 7.1 9.2 46.7 2.3 3 17.6 1.8 1.0 3.5 2.9 2.2 6.0 11.8 50.4 2.7 4 23.31.9 1.0 3.1 4.6 3.8 8.9 10.6 37.6 5.3 5 23.0 2.6 0.7 3.5 1.6 2.3 3.8 8.152.9 1.6 control 19.6 2.8 1.1 3.3 1.8 1.8 3.1 12.0 53.6 1.0 control 18.42.7 1.1 3.3 1.7 1.7 3.1 11.3 55.4 1.3

TABLE 8 FAME Profile Sample 16:0 16:1 unk 16:3 unk 18:0 18:1 18:2 18:3unk SEQ ID 23.0 3.5 1.9 2.6 1.7 2 3.3 11.7 49.1 1.3 NO: 52 SEQ ID 25.73.4 1.3 1.8 0.8 2.3 2.1 8 54.7 0 NO: 151 control 18.7 2.8 1.2 3.8 1.41.5 4.2 10.7 55 0.6

C. Insect Control Bioassays. Nicotiana benthamiana plants expressinggenes of interest in RNA viral vectors were grown for 14 dpi asdescribed previously. Fresh leaf tissue (sample size ˜2.5 cm diameter)was excised from the base of infected leaves using a scalpel and placedin insect-rearing tray (Bio RT32, C-D International) wells containing 3ml of 2% agar. Using a small paintbrush to handle insects, 2first-instar larvae of tobacco hornworm (Manduca sexta) were placed ineach well and trays were sealed using vented covers. Trays were thenincubated at 28 C with 4S% humidity for 72 hours with a 12-hourphotoperiod. Following incubation, samples were scored for mortality andleaf damage according to the following criteria: mortality, 0=0 dead/2alive; 1=1 dead/1 alive; 2=2 dead/0 alive; leaf damage, 0=0 to 20% leafconsumed; 1=21 to 40% leaf consumed; 2=41 to 60% leaf consumed; 3=61 to80% leaf consumed; and 4=81 to 100% leaf consumed. Following scoring,insects were weighed on an analytical balance and photographed using adigital camera.

The following sequences exhibited a positive insect control phenotype:SEQ ID Nos: 3, 150, 151, 26, 31, 36, 58, 78, 94, 106, 107, 110, 112,113, 114, 117, 123.

D. Carbohydrate Screen. The dry residue was transferred from theextracting cartridge (10-20 mg) into a 100×13 mm glass tube containing0.5 ml of 0.5 N HCl in methanol and 0.12 ml of methyl acetate and thensealed (Teflon coated screw cap) under nitrogen and heated for 16 hoursat 800C. The liquid phase was then transferred using an 8-channelpipetter (Matrix) to a glass insert supported by a 96 well aluminumblock plate (Modem Metal Craft) and evaporated to dryness (ConcentratorEvaparray). The methyl-glycosides and methyl-glycoside methyl esterswere silylated in 0.1 ml pyridine and 0.1 ml BSTFA+1% TMCS at roomtemperature for one hour. The sample generated was analyzed on a DB1capillary column (15 meters) with an 11 minute program temperature (from160° C. to 190° C. at 5° C./min and 190° C. to 298° C. at 36° C./minuteand hold 2 minutes) and 3 minutes equilibration time. The followingcomponents of the plant cell wall were identified in the tobacco sample:arabinose, rhamnose, xylose, galactose, galacturonic acid, mannose,glucuronic acid and glucose.

E. GC/MS Metabolite Analysis: A 3 mm tungsten carbide ball bearing wasplaced into each well of a 96-well deep well block and 300 μl ofgrinding buffer (2 mM NaOH, 1 mM PMSF, 10 mM beta-mercaptoethanol, anddeuterium-labeled compounds) was added to each well. A 13 mm circle (˜20mg) leaf disc plug from 4 week old Nicotiana benthamiana (2 weekpost-inoculation) apical leaves were placed into the 96-well microtiterdeepwell plate. The plate was tightly sealed and placed on a mechanicalshaker (paint mixer, up to four at a time) for 2 min, then rotated 180°and shaken for an additional 2 min. Subsequently, the samples were spunfor 10 min at 3200 RPM in a refrigerated (15° C.) centrifuge equippedfor microtiter plates. Following centrifugation, the 96-well platecontaining the homogenized samples was placed on a TECAN GENESIS RSP 200(TECAN, Research Triangle Park, N.C.) liquid handler/robotics system.Both Logic and Gemini software were used to control the TECAN liquidhandler. Approximately 200 μl was transferred to a pre-conditioned (1 mlMeOH followed by 1 ml of distilled deionized H₂O) Waters 96-well OasisHLB solid phase extraction (SPE) plate by the TECAN liquid handler formetabolite analysis by GC/MS. The Waters Extraction Plate Manifold Kitand a vacuum not greater than 5 mm Hg was used to aspirate plant samplesfrom SPE plate into a waste reservoir. The SPE plate was then washedwith 1 ml of 5% MeOH in H₂O by aspirating into waste reservoir andcompounds eluted from SP resin with 350 μl of MeOH into a 96-wellcollection plate. Samples were then transferred to GC autosampler vials,capped and stored in the freezer at 80° C. for metabolite analysis.

An internal standard solution was prepared by making a stock solution ata concentration of 1 μg/μl (using compound density). Grinding buffer (2mM NaOH above) with the internal standard was prepared at aconcentration of 10 ng/μl for each (3,000 ng/300 μl) to yield aconcentration equivalent of approximately 150 ng/mg wet weight of planttissue. Following extraction of plant material, this solution wastransferred to the SPE plate by the TECAN liquid handler and extractedwith 350 μl of MeOH. Approximately 20 μl of the sample will be injectedonto a 30 m×0.32 mm DB-WAX (1 μm film thickness) GC column with a largevolume injector during the preliminary study. The GC column oven wastemperature held at 35 C for 5 min, then programmed at 2.5° C./min to250° C. and held for 15 min.

Samples that contained peaks that were present in altered levelsrelative to control samples as identified from chromatograms werefurther analysis using mass spectroscopy. Samples that were transfectedwith the following nucleic acid sequences were found to have alteredmetabolic profiles: SEQ ID NO: 43, 49, 79, 84, and 94. Table 9 shows theretention time and % change in peaks relative to controls for severalsequences. Table 9 also shows the identity of the peaks as determined bymass spectroscopy.

TABLE 9 Metabolic Profiles SEQ ID NO RT (MIN) % Change Compound 43 10.68+130 Malic Acid 43 11.63 +250 Ribonic Acid; Gamma- lactone 43 12.93 +260Quinic Acid 43 14.12 +120 Inositol 79 10.67 +300 Malic Acid 79 10.87+150 L-Aspartic Acid 79 10.92 +80 5-Oxo-L-Proline (pyroglutamic) 7912.48 +100 Ribonic Acid 79 12.64 +800 Citric Acid 79 16.44 +60 Sucrose94 FA 9.31 −95 Dodecanoic Acid (12:0) 94 FA 10.28 −90 Myristic Acid(14:0) 94 FA 11.20 +500 Hexadecenoic Acid (16:1) 94 FA 11.96 +200 OleicAcid (18:1) 94 10.68 +700 Malic Acid 94 11.63 +300 Ribonic Acid; Gamma-lactone 94 12.33 +300 Phosphoric Acid 94 12.65 −1400 Citric Acid 9412.93 +500 Quinic Aci 94 14.12 +800 Inositol 49 11.0 New 49 11.7 New

Example 13

Protein Profiling by MALDI-TOF

Approximately 14 days post-inoculation, 960 different N. benthamianaleaf plugs transfected with encapsidated virion from a GENEWAREexpression library from growth rooms and 38 from N. benthamiana infectedin Owensboro, Kentucky were collected and the soluble proteins extractedwith a high throughput micro-extraction technique described below. Analiquot of this solution was automatically diluted with matrix by aliquid handler in preparation for analysis by MALDI-TOF massspectrometry for proteins.

A. Sample Preparation by High Throughput Micro-Extraction: A 3 mmtungsten carbide ball bearing was placed into each well of a 96-welldeep well block and 300 μl of grinding buffer (2 mM NaOH, 1 mM PMSF, 10mM beta-mercaptoethanol, and deuterium-labeled compounds-GC/MS analysis)was added to each well. A 13 mm circle (˜20 mg) leaf disc plug from ˜4week old Nicotiana benthamiana (2 week post-inoculation) apical leaveswere placed into the 96-well microtiter deepwell plate. The plate wastightly sealed and placed on a mechanical shaker (paint mixer, up tofour at a time) for 2 min, then rotated 180° and shaken for anadditional 2 min. Subsequently, the samples were spun for 10 min at 3200RPM in a refrigerated (15° C.) centrifuge equipped for microtiterplates. Following centrifugation, the 96-well plate containing thehomogenized samples was placed on a TECAN GENESIS RSP 200 (TECAN,Research Triangle Park, N.C.) liquid handler/robotics system. Both Logicand Gemini software were used to control the TECAN liquid handler.Samples were diluted by the TECAN liquid handler in a round bottom96-well plate for MALDI-TOF analysis by adding 18 μl of sinapinic acidmatrix and 2 μl of plant extract to each well. Samples were mixed wellby aspirating/dispensing 10 μl volumes five times. A 2 μl aliquot ofeach sample was spotted onto a 100 sample MALDI plate. In addition, a5.0 μl aliquot of each sample was transferred to a 96-well microtiterplate for PCR and/or MALDI backup analysis and stored at 800C. Two planttrays containing 96 individually infected each were extracted each dayfor 5 days.

B. MALDI-TOF Mass Spectrometry Analysis: An aliquot of the homogenizedplant samples were diluted 1:10 with sinapinic acid (Aldrich, Milwaukee,Wis.) matrix, 2 μl applied to a stainless steel MALDI plate surface andallowed to air dry for analysis. The sinapinic acid was prepared at aconcentration of 10 mg/ml in 0.1% TFA/acetonitrile (70/30) by volume.MALDI-TOF mass spectra were obtained with a PerSeptive BiosystemsVoyager DE-PRO operated in the linear mode. A pulsed nitrogen laseroperating at 337 nm was used in the delayed extraction mode forionization. An acceleration voltage of 25 kV with a 90% grid voltage anda 0.1% guide wire voltage was used. Approximately 150 scans wereacquired and averaged over the mass range of 2000-156,000 Da. with a lowmass gate of 2000. Ion source and mirror pressures were approximately2.2×10⁻⁷ and 8×10⁻⁸ Torr, respectively. All spectra were mass calibratedwith a single-point fit using horse apomyoglobin (16,952 Da).

C. Results: This study describes a method that was developed using thehigh-throughout capabilities of MALDI-TOF MS to detect changes in totalprotein profiles of crude plant extracts derived from a GenomicsGENEWARE expression library. As many as 192 samples per day wereextracted and analyzed for protein profiling using MALDI-TOF massspectrometry. In addition, the method has been optimized in house fordetection of a wide range of protein masses from one MALDI-TOF scan.More than 50 proteins were routinely detected in a MALDI profilespectrum ranging from approx. 3,000 to 110,000 Da. In addition to thecoat protein (˜17,500 Da), both small (˜14,500 Da) and large (˜52,750Da) subunits of RuDP carboxylase were routinely detected in the plantsamples. Several other proteins were common to most of the plantsanalyzed. The most abundant proteins were observed at around 3,386,3,970, 4,408, 5,230, 7,280 (doubly charged ion for small sub-unit ofRuDP carboxylase), 8,334, 9,350, 10,450 (most abundant protein overall),14,020, 18,006, 19,628, 20,286, 21,173, 24,014, 25,124 and 29,140 (dimerof small sub-unit) Daltons. A series of less abundant proteins were alsodetected. Up-regulated or novel proteins were detected in 17.3% of the960 spectrums that were analyzed. This file was entered into the LIMSdatabase.

Example 14

PFam Analysis

In addition to the PFam analysis described above (Tables 1-3), PFamanalysis was performed on the remainder of the nucleic acid sequences ofthe present invention. The results for sequences that fit into a proteinfamily as determined by PFam are shown in Table 10.

The nucleic acid sequences were further analyzed to determine the originof the nucleic acid. The following sequences were found to beNicotiniana benthamiana sequences: SEQ ID NOs: 1, 2, 81, 121, 123, 124,125, 129. The following sequences were found to be rice sequences: SEQID NOs: 130, 133, 134, 149. The following sequences were found to bepoppy sequences: SEQ ID NOs: 148, 146, 145, 144, 143, 142, 141, 140. Theremainder of SEQ ID NOs 1-154 not listed above were found to beArabidopsis sequences.

TABLE 10 PFam Analysis SEQ ID NO: PFam Family Score P-Value 4 RibosomalS11 191.8 9.6e−59 8 Serine carbpept 154 2.6e−42 9 RF-1 83.7 2.3e−26 11Spermine synthase 176.4 4.6-49 15 RF-1 83.7 2.3e−26 23 Aminotran-3 67.37.7e−23 26 Seedstore-2S 58.8 1.2e−13 31 RuBisCO-small 226.3 4.5e−64 36Carb-anhydrase 23.7 1.4e−05 44 IF4E 86.1 7.4e−22 45 Peroxidase 106.26.5e−28 46 MATH 25.9 7.4e−06 50 Peroxidase 97.5 2.6e−25 51 PsbP 188.98.2e−53 54 RuBisCO-small 201.8   1e−56 61 Ribosomal-L14 193.5 3.3e−54 74Ribosomal-L18p 52.3   1e−11 79 Histone 113.1 5.5e−30 83 Pectinesterase−26.5 6.8e−08 86 HSF-DNA-bind 126.2 4.1e−36 88 RCC1 51.3 3.5e−13 90Ubiquitin 362.9  3.4e−105 99 Ubiquitin 216.4 4.4e−61 102 Ribosomal-S16110.8 2.7e−29 103 AP2-domain 158.3 1.3e−43 106 Ribosomal-L21e 90.72.8e−23 107 Ribosomal-S20p 64.1 2.9e−15 111 Ribosomal-L35Ae 163.34.2e−45 113 TCTP 239.4 5.1e−68 117 Ribosomal-L11 149.1 7.5e−41

Example 15 Ortholog and Homolog Analysis

The nucleic acid sequences of the present invention were furtheranalyzed by translating the nucleic acid sequence into the predictedpolypeptide sequence. The corresponding amino acid sequence was thenused to search protein databases for orthologs and homologs.

Example 16 ABRC Library Construction in GENEWARE Expression Vectors

Expressed sequence tag (EST) clones were obtained from the ArabidopsisBiological Resource Center (ABRC; The Ohio State University, Columbus,Ohio 43210). These clones originated from Michigan State University(from the labs of Dr. Thomas Newman of the DOE Plant Research Laboratoryand Dr. Chris Somerville, Carnegie Institution of Washington) and fromthe Centre National de la Recherche Scientifique Project (CNRS project;donated by the Groupement De Recherche 1003, Centre National de laRecherche Scientifique, Dr. Bernard Lescure and colleagues). The cloneswere derived from cDNA libraries isolated from various tissues ofArabidopsis thaliana var Columbia. A clone set of 11,982 clones wasreceived as glycerol stocks arrayed in 96 well plates, each with an ABRCidentifier and associated EST sequence.

An ORF finding algorithm was performed on the EST clone set to findpotential full-length genes. Approximately 3,200 full-length genes werefound and used to make GENEWARE constructs in the sense orientation.Five thousand of the remaining clones (not full-length) were used tomake GENEWARE constructs in the antisense orientation.

Full-length clones used to make constructs in the sense orientation weregrown and DNA was isolated using Qiagen (Qiagen Inc., Valencia, Calif.91355) mini-preps. Each clone was digested with NotI and Sse 8387 eightbase pair enzymes. The resultant fragments were individually isolatedand then combined. The combined fragments were ligated into pGTN P/Nvector (with polylinker extending from PstI to NotI—5′ to 3′). For eachset of 96 original clones approximately 192 colonies were picked fromthe pooled GENEWARE ligations, grown until confluent in deep-well96-well plates, DNA prepped and sequenced. The ESTs matching the ABRCdata was bioinformatically checked by BLAST and a list of missing cloneswas generated. Pools of clones found to be missing were prepared andsubjected to the same process. The entire process resulted in greaterthan 3,000 full-length sense clones.

The negative sense clones were processed in the same manner, but ligatedinto pGTN N/P vector (with polylinker extending from NotI to PstI—5′ to3′). For each set of 96 original clones approximately 192 colonies werepicked from the pooled geneware ligations and DNA prepped. The DNA fromthe GENEWARE ligations was subjected to RFLP analysis using TaqI 4 basecutter. Novel patterns were identified for each set. The RFLP method wasapplied and only applicable for comparison within a single ABRC plate.This procedure resulted in greater than 6,000 negative sense clones.

The identified clones were re-arrayed, transcribed, encapsidated andused to inoculate plants.

Example 17 Inoculation of plants

A. Plant Growth. N. benthamiana seeds were sown in 6.5 cm pots filledwith Redi-earth medium (Scotts) that had been pre-wetted with fertilizersolution (prepared by mixing 147 kg Peters Excel 15-5-15 Cal-Mag (TheScotts Company, Marysville Ohio), 68 kg Peters Excel 15-0-0 Cal-Lite(15% Ca), and 45 kg Peters Excel 10-0-0 MagNitrate (10% Mg) in hot tapwater to 596 liters total volume and then injecting this concentrateinto irrigation water using an injection system (H. E. Anderson,Muskogee Oklahoma), at a ratio of 200:1). Seeded pots were placed in thegreenhouse for 1 d, transferred to a germination chamber, set to 27° C.,for 2 d (Carolina Greenhouses, Kinston, N.C.), and then returned to thegreenhouse. Shade curtains (33% transmittance) were used to reduce solarintensity in the greenhouse and artificial lighting, a 1:1 mixture ofmetal halide and high pressure sodium lamps (Sylvania) that delivered anirradiance of approximately 220 μmol m²s⁻¹, was used to extend daylength to 16 h and to supplement solar radiation on overcast days.Evaporative cooling and steam heat were used to regulate greenhousetemperature, maintaining a daytime set point of 27° C. and a nighttimeset point of 22° C. At approximately 7 days post sowing (dps), seedlingswere thinned to one seedling per pot and at 17 to 21 dps, the pots werespaced farther apart to accommodate plant growth. Plants were wateredwith Hoagland nutrient solution as required. Following inoculation,waste irrigation water was collected and treated with 0.5% sodiumhypochlorite for 10 minutes to neutralize any viral contamination beforedischarging into the municipal sewer.

B. Innoculation. For each GENEWARE™ clone, 180 μL of inoculum wasprepared by combining equal volumes of encapsidated RNA transcript andFES buffer (0.1M glycine, 0.06 M K₂HPO₄, 1% sodium pyrophosphate, 1%diatomaceous earth (Sigma), and either 1% silicon carbide (Aldrich), or1% Bentonite (Sigma)). The inoculum was applied to threegreenhouse-grown Nicotiana benthamiana plants at 14 or 17 days postsowing (dps) by distributing it onto the upper surface of one pair ofleaves of each plant (˜30 μL per leaf). Either the first pair of leavesor the second pair of leaves above the cotyledons was inoculated on 14or 17 dps plants, respectively. The inoculum was spread across the leafsurface using one of two different procedures. The first procedureutilized a Cleanfoam swab (Texwipe Co, NJ) to spread the inoculm acrossthe surface of the leaf while the leaf was supported with a plastic potlabel (¾×5 2M/RL, White Thermal Pot Label, United Label). The secondimplemented a 3″ cotton tipped applicator (Calapro Swab, FisherScientific) to spread the inoculum and a gloved finger to support theleaf. Following inoculation the plants were misted with deionized water.

C. Infection. At 13 days post inoculation (dpi), the plants wereexamined visually and a numerical score was assigned to each plant toindicate the extent of viral infection symptoms. 0=no infection,1=possible infection, 2=infection symptoms limited to leaves <50-75%fully expanded, 3=typical infection, 4=a typically severe infection,often accompanied by moderate to severe wilting and/or necrosis.

Example 18 Phenotypic Evaluation

At 13 dpi plants were examined and in cases where a plant's visualphenotype deviated substantially from the phenotypes of control plants,a controlled vocabulary utilizing a five-part phrase was used todescribe the plants. Phrase: plant region/subpart/modifier(optional)/symptom/severity. Plant regions: sink leaves (the upperregion of the plant considered to be primarily phloem sink tissue at thetime of evaluation), source leaves (expanded, fully-infected leavesconsidered to be phloem source tissue at the time of evaluation),bypassed leaves (leaves [three and four] that display little or noinfection symptoms), inoculated leaves (leaves one and two), stem.Subparts: blade, entire, flower, foci, intervein, leaf, lower, majorvein, margin, minor vein, node, petiole, shoot apex, upper, vein, viralpath. Modifiers: apical, associated, banded, basal, blotchy, bright,central, crinkled, dark, epinastic, flecked, glossy, gray, hyponastic,increased, intermittent, large-spotted, light, light-colored,light-green, mottled, narrowed, orange, patchy, patterned, radial,reduced, ringspot, small-spotted, smooth, spotted, streaked, subtending,uniform, unusual, white. Symptoms: bleaching, chlorosis, color,contortion, corrugation, curling, dark green, elongation, etching,hyperbranching, mild symptoms, necrosis, patterning, recovery, stunting,texture, trichomes, wilting. Severity: 1—extremely mild/trace, 2—mildsymptom (<30% of subpart affected), 3—moderate symptom (30%-70% ofsubpart affected), 4—severe symptom (>70% of subpart affected). Based onthe symptoms a phenotypic hit value (PHV) and a herbicide hit value(HHV) were assigned to each plant phenotyped. Phenotype Hit Value: 1—nopredicted value; do not request for repeat analysis, 2—of uncertainvalue, 3—of potential value; strong phenotype, 4—highly unusualphenotype. Herbicide Hit Value: 1—no predicted value; do not request forrepeat analysis, 2—of uncertain value, 3—moderate chlorosis (especiallyin apical region) or necrosis, 4—Severe phytotoxicity/herbicide mode ofaction. Comments were added if additional information was required tocomplete the plant characterization. Results are presented in Table 11.

TABLE 11 Summary of SEQ ID NO DAS/LSBC ID Library Visual Phenotype SEQID NO: 149, GBSG0000175736 RICE/OJ Stunting 336 SEQ ID NO: 10,GBSG000025015 ABRC Stunting 288 SEQ ID NO: 26, GBSG000025104 ABRCStunting 291 SEQ ID NO: 47 GBSG000025168 ABRC Stunting SEQ ID NO: 48,GBSG000025170 ABRC Stunting 297 SEQ ID NO: 58 GBSG000025427 ABRCStunting SEQ ID NO: 59, GBSG000025431 ABRC Stunting 302 SEQ ID NO: 69,GBSG000027424 ARAB Stunting 304 SEQ ID NO: 83, GBSG000030087 ABRCStunting 313 SEQ ID NO: 102, GBSG000045801 ABRC Stunting 321 SEQ ID NO:103, GBSG000045804 ABRC Stunting 322 SEQ ID NO: 105, GBSG000045808 ABRCStunting 323 SEQ ID NO: 106, GBSG000045820 ABRC Stunting 324 SEQ ID NO:107, GBSG000045837 ABRC Stunting 325 SEQ ID NO: 109, GBSG000045850 ABRCStunting 326 SEQ ID NO: 110, GBSG000045853 ABRC Stunting 327 SEQ ID NO:111, GBSG000045855 ABRC Stunting 328 SEQ ID NO: 112, GBSG000045864 ABRCStunting 329 SEQ ID NO: 113, GBSG000045866 ABRC Stunting 330 SEQ ID NO:114, GBSG000045869 ABRC Stunting 331 SEQ ID NO: 115, GBSG000045874 ABRCStunting 332

Example 19 Metabolic Screens

A. Sample Generation. Individual dwarf tobacco nicotianabenthamiana,(Nb) plants were manually transfected with an unique DNAsequence at 14 or 17 days post sowing using the GENEWARE™ viral vectortechnology (1). Plants were grown and maintained under greenhouseconditions. At 13 days after infection, an infection rating of 0, 1, 2,3, or 4 was assigned to each plant. The infection rating documents thedegree of infection based on a visual observation. A score of 0indicates no visual infection. Scores of 1 and 2 indicate varyingdegrees of partial infection. A score of 4 indicates a plant with amassive overload of infection, the plant is either dead or near death. Ascore of 3 indicates optimum spread of systemic infection.

Samples were grouped into sets of up to 96 samples per set forinoculation, harvesting and analysis. Each sample set (SDG) included 8negative control (reference samples), up to 80 unknown (test) samples,and 8 quality control samples.

B. Harvesting. At 14 days after infection, infected leaf tissue,excluding stems and petioles, was harvested from plants with aninfection score of 3. Infected tissue was placed in a labeled,50-milliliter (mL), plastic centrifuge tube containing a tungstencarbide ball approximately 1 cm in diameter. The tube was immediatelycapped, and dipped in liquid nitrogen for approximately 20 seconds tofreeze the sample as quickly as possible to minimize degradation of thesample due to biological processes triggered by the harvesting process.Harvested samples were maintained at −80° C. between harvest andanalysis. Each sample was assigned a unique identifier, which was usedto correlate the plant tissue to the DNA sequence that the plant wastransfected with. Each sample set was assigned a unique identifier,which is referred to as the harvest or meta rack ID.

C. Extraction. Prior to analysis, the frozen sample was homogenized byplacing the centrifuge tube on a mechanical shaker. The action of thetungsten carbide ball during approximately 30 seconds of vigorousshaking reduced the frozen whole leaf tissue to a finely homogenizedfrozen powder. Approximately 1 gram of the frozen powder was extractedwith 7.5 mL of a solution of isopropanol (IPA):water 70:30 (v:v) byshaking at room temperature for 30 minutes.

D. Fractionation. A 1200 microliter (μL) aliquot of the IPA:waterextract was partitioned with 1200 μL of hexane. The hexane layer wasremoved to a clean glass container. This hexane extract is referred toas fraction 1 (F1). A 90 μL aliquot of the hexane extracted IPA:waterextract was removed to a clean glass container. This aliquot is referredto as fraction 4 (F4). The remaining hexane extracted IPA:water extractis referred to as fraction 3 (F3). A 200 μL aliquot of the IPA:waterextract was transferred to a clean glass container and referred to asfraction 2 (F2). Each fraction for each sample was assigned a uniquealiquot ID (sample name).

E. Sample Preparation & Data Generation

Fraction 1: The hexane extract was evaporated to dryness under nitrogenat room temperature. The sample containers were sealed and stored at 4°C. prior to analysis, if storage was required. Immediately prior tocapillary gas chromatographic analysis using flame ionization detection(GC/FID), the F1 residue was reconstituted with 120 μL of hexanecontaining pentacosane and hexatriacontane which were used as internalstandards for the F1 analyses. The chromatographic data files generatedfollowing GC separation and flame ionization detection were named withthe fraction 1 aliquot ID for each sample and stored in a folder namedafter the harvest rack (sample set) ID. FIG. 8 a summarizes the GC/FIDparameters used to analyze fraction 1 samples.

Fraction 2: The F2 aliquot was evaporated to dryness under nitrogen atroom temperature and reconstituted in heptane containing 2 internalstandards, C11:0 and C24:0. In general, fraction 2 is designed toanalyze esterified fatty acids, such as phospholipids,triacylglycerides, and thioesters. In order to analyze these compoundsby GC/FID, they were transmethylated to their respective methyl estersby addition of sodium methoxide in methanol and heat. Excess reagent wasquenched by the addition of a small amount of water, which results inphase separation. The fatty acid methyl esters (FAMEs) were contained inthe organic phase. FIG. 8 b summarizes the GC/FID parameters used toanalyze fraction 2 samples.

Fraction 3: The F3 aliquot was evaporated to dryness under nitrogen at40° C. In general, the metabolites in this fraction are highly polar andwater-soluble. In order to analyze these compounds by GC/FID, the polarfunctional groups on these compounds were silylated through a 2-stepderivatization process. Initially, the residue was reconstituted with400 μL of pyridine containing hydroxylamine hydrochloride (25 mg/ml) andthe internal standard, n-octyl-β-D-glucopyranoside (OXIME solution). Thederivatization was completed by the addition of 400 μL of thecommercially available reagent(N,O-bis[Trimethylsily]trifluoroacetamide)+1% Trimethylchlorosilane(BSTFA+1% TMCS). The chromatographic data files generated following GCseparation and flame ionization detection were named with the fraction 3aliquot ID for each sample and stored in a folder named after theharvest rack (sample set) ID. FIG. 8 c summarizes the GC/FID parametersused to analyze fraction 2 samples.

Fraction 4: The F4 aliquot was diluted with 90 μL of distilled water and20 μL of an 0.1 N hydrochloric acid solution containing norvaline andsarcosine, which are amino acids that are used as internal standards forthe amino acids analysis. Immediately prior to high performance liquidchromatographic analysis using fluorescence detection (HPLC/FLD), theamino acids in F4 are mixed in the HPLC injector at room temperaturewith buffered orthophtaldehyde solution, which derivatizes primary aminoacids, followed by fluorenyl methyl chloroformate, which derivatizessecondary amino acids. Following HPLC separation and fluorescencedetection, chromatographic data files were generated for each sample,named with a sequential number which can be tracked back to the F4aliquot ID, and stored in a folder named after the harvest rack (sampleset) ID. FIG. 8 d summarizes the HPLC/FLD parameters used to analyzefraction 4 samples.

F. Data Analysis & Hit Detection. Two complementary methods were used toidentify modifications in the metabolic profile of test samples fromreference samples. These data analysis methods are called automated dataanalysis (ADA) and quantitative data analysis. Each fraction from eachsample was analyzed by one or both of these methods to identify hits. Ifeither method identified a fraction as a hit, the sample was called ahit for that fraction. Therefore a sample could be a hit for 1 through 4fractions.

ADA employs a qualitative pattern recognition approach using ABNORM(U.S. Pat. No. 5,592,402), which is a proprietary software utility ofthe Dow Chemical Company. ADA was performed on chromatograms from all 4fractions. The ADA process developed a statistical model fromchromatograms that ideally depict unaltered (reference) metabolicprofiles. This model was then used to identify test sample chromatogramsthat contain statistically significant differences from the normal(control) chromatograms. Updated models for each fraction were generatedfor each sample set. Chromatograms identified as hits by ADA, weremanually reviewed and the data quality visually verified.

Quantitative data analysis is based on individual peak areas.Quantitative data analysis was applied to specific compounds of interestin fraction 2, fatty acids, and fraction 4, amino acids. The peak areascorresponding to these compounds in these fractions were generated. Forfraction 2, the relative percent of the peak areas for the compounds inTable V were calculated for each sample. The average ( x) and standarddeviation (STD) of the relative % of the peak areas for the individualcompounds were calculated from the reference sample chromatogramsanalyzed within the sample set. The average and STD were used tocalculate a range for each compound. Depending on the compound, thisrange was typically x+/−3 or 5 STDs. If the relative percent of the peakarea from an unknown was outside this range, the compound was consideredto be significantly different from the ‘normal’ level and the sample wasidentified as a hit for F2. For fraction 4, the concentration, inmicrograms/gram was calculated for each of the amino acids listed inTable 12, from calibration standards analyzed at the same time as thetest samples. The amino acid concentrations from reference samples wereused to calculate the acceptable range from the x and STD for each aminoacid. If the amino acid concentration for an unknown falls outside thisrange, the amino acid was considered to be different from normal andsample was identified as a hit for F4.

TABLE 12 Tobacco Metabolites Monitored in Fractions 2 and 4 byQuantitative Analysis Fraction 4 Fraction 2 (Fatty Acids) (Amino Acids)undecanoic acid methyl ester* C11:0 Aspartic Acid ASP Pentadecanoic acidmethyl ester** C15:0 Glutamic GLU Acid Pentadecanoic acid ethyl ester**C15:0 Serine SER palmitic acid methyl ester C16:0 Histidine HISpalmitoleic acid methyl ester C16:1 Glycine GLY iso methylpentadecanoicacid methyl C16:0:Me Threonine THR ester palmitoleic acid methyl esterC16:2 Alanine ALA palmitolenic acid methyl ester C16:3 Arginine ARG isomethylhexadecanoic acid methyl C17:0Me Tyrosine TYR ester Stearic acidmethyl ester C18:0 Cystine CY2 Oleic acid methyl ester C18:1 Valine VALLinoleic acid methyl ester C18:2 Methionine MET Linolenic acid methylester C18:3 Norvaline* NVA Arachidic acid methyl ester C20:0 TryptohaneTRP Lignoceric acid methyl ester* C24:0 Phenylalanine PHE Isoleucine ILELeucine LEU Lysine LYS Sarcosine* SAR Proline PRO *Internal Standard**Surrogate Standard

Shipping Hits. Any F1, F2, or F3 fractions identified as hits by ADA orquantitative analysis, and the most typical null for each fraction foreach sample set as identified by ADA, were sent to the FunctionDiscovery Laboratory (see Example 20) for structural characterization ofthe specific compounds identified. Samples were sealed, packaged on dryice and shipped for overnight delivery.

Example 20 Identification of Metabolic Changes

This Example describes the identification of the chemical nature ofgenetic modifications made in tobacco plants using GENEWARE viral vectortechnology. The protocols involved the use of gas chromatography/massspectrometry (GC/MS) for the analyses of three primary fractionsobtained from extraction and fractionation processes.

A. Methods. Major instruments and accessories used includedBioinformatics computer programs, mass spectral libraries, Biotechdatabases, Nautilus LIMS system (BLIMS; Dow), Biotech Database (eBRAD;Dow), HP Model 6890 capillary Gas Chromatograph (GC; AgilentTechnologies), HP Model 5973 Mass Selective Detector (MSD; AgilentTechnologies), Auto Sampler and Sample Preparation Station (LeapTechnologies), Large Volume Injector system (APEX), Ultra Freezer(Revco), and model LS1006 Barcode Reader (Symbol Technologies).

Samples and corresponding References (also referred to as controls ornulls) were shipped via overnight mail. Samples were removed from theshipping container, inspected for damage, and then placed in a freezeruntil analysis by GC/MS.

Samples were received in vials or in titer plates with a bar-coded titerplate (TP) number, also referred to as a Rack Identification number thatis used to track the sample in the BLIMS system. The barcode number isused by the FDL to extract from BLIMS pertinent information from ADA(Automated chromatographic pattern recognition Data Analysis) HITreports and/or QUANT (a quantitative data analysis approach that makesuse of individual peak areas of select peaks corresponding to specificcompounds of interest in the fatty acid Fraction 2) HIT reportsgenerated by the Metabolic Screening Laboratory. The information inthese reports includes the well position of the respective HITs(Samples), the corresponding well position of the Reference, and otherpertinent information, such as, aliquot identification. This informationis used to generate ChemStation and Leap sequences for FDL analyses.

Samples were sequenced for analysis in the following order:

TABLE 13 Analysis Order Solvent Blank Instrument Performance StandardSamples and Associated Reference . . . Performance Standard SolventBlank

Samples were analyzed on GC/MS systems using the following procedures.Fraction 1 samples were shipped dry and required a hexane reconstitutionstep. Fraction 2 and Fraction 3 samples were analyzed as received.Internal standards were added to the samples prior to analysis.

B. Fraction 1 Analysis. The name of the GC/MS method used is BIONEUTx(where x is a revision number of the core GC/MS method). The method isretention-time locked to the retention time of pentacosane, an internalstandard, using the ChemStation RT Locking algorithm.

Internal Standard(s)

-   Pentacosane-   Hexatriacontane

Chromatography Column: J&W DB-5MS 50 M × 0.320 mm × 0.25 μm film Mode:constant flow Flow: 2.0 mL/min Detector: MSD Outlet psi: vacuum Oven:40° C. for 2.0 min 20° C./min to 350° C., hold 15.0 min Equilibrationtime: 1 min Inlet: Mode: split Inj Temp: 250° C. Split ratio: 50:1 GasType: Helium LEAP Injector: Injector: Inj volume: optimized topentacosane peak intensity (typically 20 μL) Sample pumps: 2 Washsolvent A: Hexane Wash solvent B: Acetone Preinj Solvent A washes: 2Preinj Solvent B washes: 2 Postinj Solvent A washes: 2 Postinj Solvent Bwashes: 2 APEX Injector Method Name: BIONEUTx (where x is a revisionnumber of the core APEX method). Modes: Initial: Standby (GC Split)Splitless: (Purge Off) 0.5 min GC Split: (Standby) 4 min ProSep Split:(Flow Select) 23 min Temps: 50° C. for 0.0 min. 300° C./min to 350° C.,hold for 31.5 min Mass Spectrometer Scan: 35-800 Da at sampling rate 2(1.96 scans/sec) Solvent delay: 4.0 min Detector: EM absolute: False EMoffset: 0 Temps: Transfer line: 280° C. Ion source: 150° C. MS Source:230° C.

C. Fraction 2 Analysis: The name of the GC/MS method used is BIOFAMEx(where x is a revision number of the core GC/MS method). The method isretention-time locked to RT of undecanoic acid, methyl ester, aninternal standard, using the ChemStation RT Locking algorithm.

Internal Standard(s)

-   Undecanoic acid, methyl ester-   Tetracosanoic acid, methyl ester

Chromatography Column: J & W DB-23 FAME 60 M × 0.250 mm × 0.15 μm filmMode: constant flow Flow: 2.0 mL/min Detector: MSD Outlet psi: vacuumOven: 50° C. for 2.0 min 20° C./min to 240° C., hold 10.0 minEquilibration time: 1 min Inlet: Mode: split Inj Temp: 240° C. Splitratio: 50:1 Gas Type: Helium LEAP Injector: Injector: Inj volume:optimized to undecanoic acid, methyl ester peak intensity (Typically 10μL) Sample pumps: 2 Wash solvent A: Methanol Wash solvent B: MethanolPreinj Solvent A washes: 2 Preinj Solvent B washes: 2 Postinj Solvent Awashes: 2 Postinj Solvent B washes: 2 APEX Injector Method Name:BIOFAMEx (where x is a revision number of the core APEX method). Modes:Initial: GC Split Splitless: 0.5 min GC Split: 4 min ProSep Split: 21min Temps: 60° C. for 0.5 min. 300° C./min to 250° C., hold for 20 min300° C./min to 260° C., hold for 5 min Mass Spectrometer Scan: 35-800 Daat sampling rate 2 (1.96 scans/sec) Solvent delay: 4.5 min Detector: EMabsolute: False EM offset: 0 Temps: Transfer line: 200° C. Ion source:150° C. MS Source: 230° C.

D. Fraction 3 Analysis. The name of the GC/MS method used is BIOAQUAx(where x is a revision number of the core GC/MS method). Method isretention-time locked to the RT of n-Octyl-β-D-Glucopyranoside, aninternal standard, using the ChemStation RT Locking algorithm.

Internal Standard(s)

-   n-Octyl-β-D-Glucopyranoside

Chromatography Column: Chrompack 7454 CP-SIL 8 60 M × 0.320 mm × 0.25 μmfilm Mode: constant flow Flow: 2.0 mL/min Detector: MSD Outlet psi:vacuum Oven: 40° C. for 2.0 min 20° C./min to 350° C., hold 10.0 minEquilibration time: 1 min Inlet: Mode: split Inj Temp: 250° C. Splitratio: 50:1 Gas Type: Helium LEAP Injector: Injector: Inj volume:Optimized to n-Octyl-β- D-Glucopyranoside peak intensity (Typically 2.5μL) Sample pumps: 2 Wash solvent A: Hexane Wash solvent B: AcetonePreinj Solvent A washes: 2 Preinj Solvent B washes: 2 Postinj Solvent Awashes: 2 Postinj Solvent B washes: 2 APEX Injector Method Name:BIOAQUAx (where x is a revision number of the core APEX method). Modes:Initial: GC Split Splitless: 0.5 min GC Split: 4 min ProSep Split: 20min Temps: 60° C. for 0.5 min. 300° C./min to 350° C., hold for 21.1 minMass Spectrometer Scan: 35-800 Da at sampling rate 2 (1.96 scans/sec)Solvent delay: 4.0 min Detector: EM absolute: False EM offset: 0 Temps:Transfer line: 280° C. Ion source: 150° C. MS Source: 230° C.

E. Performance Standard: Two mixtures were used as instrumentperformance standards. One standard was run with Fraction 1 and 3samples and the second was run with Fraction 2 samples. Below is thecomposition of the standards as well as approximate retention timevalues observed when run under the GC/MS conditions previouslydescribed. These retention time values are subject to change dependingupon specific instrument and chromatographic conditions.

TABLE 14 Fraction 1 and 3 Performance Standard Time Compound 6.25dimethyl malonate 7.25 dimethyl succinate 8.15 dimethyl glutarate 8.98dimethyl adipate 11.06 dimethyl azelate 11.42 hexadecane 11.70 dimethylsebacate 13.57 eicosane 15.36 tetracosane 16.88 octacosane 18.26dotriacontane 19.95 hexatriacontane

TABLE 15 Fraction 2 Performance Standard Time Compound 8.82 undecanoicacid, methyl ester 9.32 dodecanoic acid, methyl ester 10.24tetradecanoic acid, methyl ester 11.07 hexadecanoic acid, methyl ester11.84 octadecanoic acid, methyl ester 11.90 oleic acid, methyl ester12.14 linoleic acid, methyl ester 12.39 linoleic acid, methyl ester12.60 eicosanoic acid, methyl ester 13.42 docosanoic acid, methyl ester

F. Data Analysis. Sample and Reference data sets were processed usingthe Bioinformatics computer program Maxwell. The principal elements ofthe program are 1) Data Reduction, 2) two-dimensional Peak Matching, 3)Quantitative Peak Differentiation (Determination of RelativeQuantitative Change), 4) Peak Identification, 5) Data Sorting, and 6)Customized Reporting.

The program queries the user for the filenames of the Reference data setand Sample data set(s) to compare against the Reference. A completelisting of user inputs with example input is shown below.

TABLE 16 Bioinformatics Analysis USER QUERY EXAMPLE USER INPUT OperatorName M. Maxwell Total number of data files to process  5 Which Fraction 3 Reference (Control) File Name AAPR0020.D Process a specific RT RangeY Specific RT range  6.5–23 Internal Standard Retention Time 14.902 +/−variation in Internal Std. RT  .004 Variation in peak RI, ChemStation .005 Percent variation in peak RI, Biotech  .010 Database Threshold fordetermining Area % change 60 Spectral Matching Value (Threshold MS-  .95XCR for peaks to be a match) Percent to determine LOP-PM* Value  1Percent to determine LOP-SRT** Value  3 Quality Level for Library(Library match) 80 Subtract Background Y Time Range for Background21.5–22.6 SHORT SUMMARY (y/n, y = no Y chromatograms) *LOP-PM-Limit ofProcessing for Peak Mathcing **LOP-SRT-Limit of Processing for Sorting

The program integrates the Total Ion Chromatogram (TIC) of the data setsusing Agilent Technologies HP ChemStation integrator parametersdetermined by the analyst. The corresponding raw peak areas are thennormalized to the respective Internal Standard peak area. It should benoted that before the normalization is performed, the programchromatographically and spectrally identifies the Internal Standardpeak. Should the identification of the Internal Standard not meetestablished criteria for a given Fraction, then the data set will not befurther processed and it will be flagged for analyst intervention.

Peak tables from the Reference and each Sample were generated. The peaktables are comprised of retention time (RT), retention index (RI)—theretention time relative to the Internal Standard RT, raw peak areas,peak areas normalized to the Internal Standard, and other pertinentinformation.

The first of two filtering criteria, established by the analyst was theninvoked and must be met before a peak is further processed. Thecriterion is based upon a peak's normalized area. All normalized peakshaving values below the Limit of Processing for Peak Matching (LOP-PM),were considered to be “background”. These “peaks” were not carried forthfor any type of mathematical calculation or spectral comparison.

In the initial peak-matching step, the Sample peak table was compared tothe Reference peak table and peaks between the two were paired basedupon their respective RI values matching one another (within a givenvariable window). The next step in the peak matching routine utilizedmass spectral data. Sample and Reference peaks that have beenchromatographically matched were then compared spectrally. The spectralmatching was performed using a mass spectral cross-correlation algorithmwithin the Agilent Technologies HP ChemStation software. Thecross-correlation algorithm generates an equivalence value based uponspectral “fit” that was used to determine whether thechromatographically matched peaks are spectrally similar or not. Thisequivalence value is referred to as the MS-XCR value and must meet orexceed a predetermined value for a pair of peaks to be “MATCHED,” whichmeans they appear to be the same compound in both the Reference and theSample. The MS-XCR value can also be used to judge peak purity. Thistwo-dimensional peak matching process was repeated until all potentialpeak matches were processed. At the end of the process, peaks arecategorized into two categories, MATCHED and UNMATCHED.

A second filtering criterion was next invoked, again based upon thenormalized area of the MATCHED or UNMATCHED peak. For a peak to bereported and further processed, its normalized area must meet or exceedthe predetermined Limit of Processing for Sorting (LOP-SRT).

Peaks that are UNMATCHED are immediately flagged as different. UNMATCHEDpeaks are of two types. There are those that are reported in theReference but appear to be absent in the Sample (based upon criteria forquantitation and reporting). These peaks were designated in the AnalystReport with a percent change of “−100 percent” and the description“UNMATCHED IN SAMPLE.” The second types of peaks are those that were notreported in the Reference (again, based upon criteria for quantitationand reporting) but were reported in the Sample, thus appearing to be“new” peaks. These peaks were designated in the Analyst Report with apercent change of “100 percent” and the description “NEW PEAK UNMATCHEDIN NULL.”

MATCHED peaks were processed further for relative quantitativedifferentiation. This quantitative differentiation is expressed as apercent change of the Sample peak area relative to the area of theReference peak. A predetermined threshold for change must be observedfor the change to be determined biochemical and statisticallysignificant. The change threshold is based upon previously observedbiological and analytical variability factors. Only changes above thethreshold for change were reported.

Peaks were then processed through the peak identification process asfollows. The mass spectra of the peaks were first searched against massspectral plant metabolite libraries. The equivalence value assigned tothe library match was used as an indication of a proper identification.

To provide additional confirmation to the identity of a peak, or tosuggest other possibilities, library hits were searched further againsta Biotechnology database. The Biotechnology database is based on theAccess database program from Accelrys (formerly Synopsis) and utilizesAccord for Access (also available from Accelrys) to incorporate chemicalstructures into the database.

The Chemical Abstract Services (CAS) number of the compound from thelibrary was searched against those contained in the database. If a matchwas found, the CAS number in the database was then correlated to thedata acquisition method for that record. If the method was matched, theprogram then compared the retention index (RI), in the Peak Table, ofthe component against the value contained in the database for that givenmethod. Should the RI's match (within a given window of variability)then the peak identity was given a high degree of certainty. Componentsin the Sample that are not identified by this process were assigned aunique identifier based upon Fraction Number and RI (example:F1-U0.555). The unique identifier was used to track unknown components.The program then sorts the data and generates an Analyst Report.

An Analyst Report is an interim report consisting of PBM algorithm matchquality value (equivalence value), RT, Normalized Peak Area, RI(Sample), RI (database) Peak Identification status [peak identity ofhigh certainty (peaks were identified by the program based on thepre-established criteria) or criteria not met (program did notpositively identify the component)], Component Name, CAS Number, MassSpectral Library (containing spectrum most closely matched to that ofthe component), Unknown ID (unique identifier used to track unidentifiedcomponents), MS-XCR value, Relative % Change, Notes (TCHED/UNMATCHED),and other miscellaneous information. The Analyst Report was reviewedmanually by the analyst who determined what further analysis wasnecessary. The analyst also generated a modified report, for furtherprocessing by the program, by editing the Analyst Report accordingly.

For Fractions 2 and 3, derivatization procedures were performed prior toanalysis to make the certain components more amenable to gaschromatography. Thus, the compound names in the modified analyst report(MAR) were those of the derivatives. To accurately reflect the truecomponents of these fractions, the MAR was further processed usinginformation contained in an additional database. This databasecross-references the observed derivatized compound to that of theoriginal, underivatized “parent” compound by way of their respective CASnumbers and replaces derivatives with parent names and information forthe final report. In addition, any unidentified components were assigneda “999999-99-9” CAS number.

The Modified Analyst Report also contains a HIT Score of 0, 1, or 2. Thevalue is assigned by the analyst to the data set of the Sample aliquotbased on the following criteria:

0 No FDL data on Sample

1 FDL data collected; Sample not FDL HIT

2 FDL data collected; Sample is FDL HIT

An FDL HIT is defined as a reportable percent change (modification)observed in a Sample relative to Reference in a component of biochemicalsignificance.

An electronic copy of the final report is entered into the Nautilus LIMSsystem (BLIMS) and subsequently into eBRAD (Biotech database). Theprogram also generated a hardcopy of the pinpointed TIC and therespective mass spectrum of each component that was reported to havechanged.

“NQ” and “NEW” are two terms used in the final report. Both terms referto UNMATCHED peaks whose percent changes cannot be reported in anumerically quantitative fashion. These terms are defined as follows:

-   “NQ” is used in the case where there was a peak reported in the    Reference for which there was no match in the Sample (either because    there was no peak in the Sample or, if there was, the area of the    peak did not satisfy the Limit of Processing for Peak Matching). The    percent change designation of “—100%” used in the Analyst report is    replaced with “NQ”.    “NEW” is used in those situations where a peak was reported in the    Sample but for which there was no corresponding match in the    Reference (either because there was no peak in the Reference or, if    there was, the area of the peak did not satisfy the Limit of    Processing for Peak Matching). For these situations, the percent    change designation of “100%” used in the Analyst Report is replaced    with “NEW”. The designation of “NEW” in the final report to a    component that is present in the Sample but not in the Reference was    necessary to eliminate any ambiguity with the appearance of “100%”    for MATCHED peaks. A “100%” designation in the final report    exclusively refers to a component with modification that doubled in    the Sample relative to the Reference.

G. Results. The results of the metabolic screening are summarized inFIGS. 10 a-10 ffff. Transfection with 55 of the inserts resulted inmeasurable metabolic changes.

All publications and patents mentioned in the above specification areherein incorporated by reference. Various modifications and variationsof the described compositions and methods of the invention will beapparent to those skilled in the art without departing from the scopeand spirit of the invention. Although the invention has been describedin connection with particular preferred embodiments, it should beunderstood that the inventions claimed should not be unduly limited tosuch specific embodiments. Indeed, various modifications of thedescribed modes for carrying out the invention which are obvious tothose skilled in the art and in fields related thereto are intended tobe within the scope of the following claims.

1. An isolated nucleic acid of SEQ ID NO:47, wherein expression of saidisolated nucleic acid in a plant results in a stunting phenotype of saidplant.
 2. A vector comprising the isolated nucleic acid of claim
 1. 3.The vector of claim 2, wherein said isolated nucleic acid is operablylinked to a plant promoter.
 4. The vector of claim 2, wherein saidisolated nucleic acid is in sense orientation.
 5. A transfected plantcomprising an isolated nucleic acid of SEQ ID NO:47, wherein expressionof said isolated nucleic acid in a plant results in a stunting phenotypeof said plant.
 6. The plant of claim 5, further comprising a vectorcomprising said isolated nucleic acid sequence.
 7. A leaf from saidplant of claim 5.